Source code statistics using SVNStat »
FERDY CHRISTANT - APR 8, 2008 (08:24:35 PM)
Last week I talked about the interesting insights I had during a Software Estimation course. Amongst many other things, I learned that one of the pre-conditions for accurate estimation is the availability of historical data. This data can be anything: time sheets, bug reports, registered help desk calls, and...Lines Of Code (LOC).
Although many developers would disagree that lines of code is a reliable metric for size or productivity, it seems that most estimation methods makes use of them. These methods argue that when averaged out over a large quantity of data, lines of code indeed is reliable enough for estimation, and reasonably easy to capture.
With this in mind, and knowing that Subversion (our source code control system at work) contains a goldmine of historical data, I started to look for a way to get meaningful data out of it. My search was short, it turns out StatSVN does exactly that. I tried it out on one of the repositories of my own Subversion server (see instructions on the Wiki). I picked the Blogo.NET repository for the test run. The result can be seen online here.
Two remarks apply:
- The statistics in this case are hardly useful, since I am the only user of my Subversion server. This also means that I make very large commits, because I never have to worry about other developers conflicting with my commits. The statistics should be a lot more interesting in a real-life team scenario, like we have at work.
- Be careful in the interpretation of the statistics! Example: the test run shows that Blogo.NET contains 60,000 lines of code. Look closer and notice how this concerns 3 tags (snapshots). The real lines of code would be more like 20,000. But wait, and external component (TinyMCE) takes up 10,000 lines of code. The REAL lines of code for Blogo.NET is therefore 10,000, not 60,000. See how easy it is to make huge mistakes when using statistics?


