About
SubStats is a tool to generate statistics about a Subversion repository. It is similar in spirit to StatSVN (which is build upon StatCVS), but works quite differently. It is not an attempt to make a better StatSVN - simply, it was made before StatSVN was made available. I havn't looked much into StatSVN, but the main difference seem to be that StatSVN has more information (not necessarily more useful though) and works on large repositories. SubStats, on the other hand, should be more precise and tweakable (see the "How it Works" section below for detailes). You can compare the output of SubStats with StatSVN in the "Sample Output" section.
Features
- Calculates added, moved (!) and removed content between revisions.
- Colourful charts to give an idea over the progression of your project.
- Shows relevant information such as the most recent revision logs and how much the revisions changed.
- Shows completely useless statistical information such as average number of commits per hour of day.
- Caches the statistics such that subsequent runs of SubStats becomes much more efficient.
- All calculations are based on changes per character (and not per line).
- Integrates with ViewVC.
Limitations
- It does not work for big repositories (if your repository has more than a thousand revisions, don't get your hopes up).
- Output is quite ugly (I'm not a web designer, but I guess you have realized that by now).
- It is somewhat resource demanding because it needs the content of each file (if your repository has many large text files, it will eat your memory like it was a delicious strawberry pudding).
Sample Output
Here are some sample documents for your viewing pleasure:
- StatSVN. Compare with its own statistics here. Note: The SubStats document gives a better view of the progress than the StatSVN document because certain files (which were not written by the submitters) have been excluded.
- My master thesis. Note: I have disabled the log from the analysis since it was not in English.
- MPY SVN STAT. Compare with its own statistics here.
For those of you who are only skimming and not actually reading this (thus missing the link), here are two images from the analysis. They are converted from SVG to JPG in case you browser cannot render SVGs (but you still want to know what you are missing out on).
The first picture below shows the amount of added content (character count) over time. The second picture below shows the total number of commits by hour of day. Both are per author.

How It Works
I have tried to make SubStats as precise as possible. Therefore, it tracks content by character (and not by line), knows about moved contents within a file on a revision, and can handle copied and deleted files correctly (shame on you, StatSVN). Here is how it works and why some of these choices were made:
- Content by Character:
- It is not really fair that people who write short lines should be praised more than people who leaves everything on few lines. Also, if you are writing a LaTeX file in Emacs for example, you probably break the text on 70 characters or so on each line. This means that if you add a single word to the beginning of a paragraph and fill it by pressing "M-q", all lines in that paragraph might change as words are rearranged to fill the 70 character limit. Since you have not really changed the content of the paragraph beside the single word, you should only be praised for that addition. This is why SubStats track content by character and not by line (as most other similar projects I know do).
- White Space are Ignored:
- You shouldn't really be praised for using a lot of spaces instead of a single tab, neither should you be praised for typing a lot of new lines. Or using short words instead of long ones!
- Moved Content By Lines:
- Imagine you move an entire section from one end of the file to another. Since you haven't really made changes to the content, it will be counted as moved, and the original authors will still be connected to the moved content. The only problem is how to know when some content has moved. If it would track movement of characters (this is how everything else is handled), when you rewrite a section, or delete some content as some point and add to another, changes are that you will still be using many of the same characters as the old content - there are after all not that many in the English alphabet :). This is why moved contents is tracked by lines. It could perhaps just as well be tracked by a minimum number of continuous characters, but I think this solution works just as well.
- Copied Files:
- If you copy a file, you will not be praised for having "added" the content to the repository, nor will any of the original authors. However, it will still show up as a size change (e.g. in the two "Module Size" graphs and the "Current Char Count" column.)
- Deleted Files:
- The size of the content will be counted as deleted for that revision.
- Binary Files:
- Of cause, the content of binary files will not be tracked. SubStats checks if a file has become text or binary at each change and will handle such a change as either an addition or deletion of the content.
Requirements
First of all, you need a Ruby interpreter. You also need the SVG::Graph library, but since I'm such a nice person, I have included it in the SubStats Ruby Distribution. SubStats also relies on the REXML library, but it should come bundled with the standard distribution of Ruby. Finally you need the subversion command line tool (i.e. "svn").
I have also made an executable version of SubStats for Windows. Here, you do not need anything as all dependencies are included in the distribution (including the subversion command line tool). Just edit the file "run.bat" and make it point to your repository path, and then run it.
I have experimented with an executable for Linux, but I believe you need a 64-Bit enabled system. I have also no idea what requirements are for that system, but you can always try. Here, you don't need anything besides the subversion command line tool (i.e. "svn"). Just type "substats_linux" and you are good to go.
To view the output, you need a browser with support for SVG, CSS and XHTML. I recommend Opera if you don't know what you are doing (or know exactly what you are doing).
Download