Announcement

Saturday, September 07, 2013

Recovering from corrupted Subversion repository

As a habit I store all my personal data (training ppts, various experimental, etc) in version control. Since I am doing this for a many years, I am using Subversion to store all this data. Recently I upgraded to Subversion 1.8.  I tried to do a fresh checkout and realized that repository is corrupted. 'svnadmin verify' returned error and could not fix it. I could not do a fresh 'svn checkout' and I could not use TortoiseHg as client to Subversion. I was stuck. The error I was getting was 'E200002 : serialized hash missing terminator' error.

My first step was to Google for error message and see if I get any solution. The search results suggested running 'svnadmin verify'. It confirmed the error at revision 1266. Many others suggested taking backed-up repository dump and restored from the dump. I took backup of the repository and did not take a repository dump at the time of backup. The backup of the repository was also corrupt at the same revision. So this solution was not very useful for me.

My second step was to send a mail to Subversion users email list 'users@subversion.apache.org'. You can see the mail here. Stefan Sperling suggested 'fsfsverify' script. I tried it but it did not work for me.I also tried 'fsfsfixer' script. That also did not work. 

The best clue I got from 'Philip Martin'. I was not sure what the error 'serialized hash missing terminator' means. I tried to read the documentation for FSFS format. Based on the checked the data from 'revision' files. It seemed to be corrected. 

Philip explained the error as
"It means one of the repository files is corrupt. It could be a revision files in db/revs or it could be a revprop file in db/revprops. A serialized hash is a series of K/V pairs followed by END:"

The key was "it could be a revrop file db/revprops". So far I missed on checking the 'revprop' files. I checked the revprops file for revision 1266 and found that it is made of just Zeros. There was no content.

Now the problem was clear. The revprop files are corrupt, I did not have the backup of 'uncorrupted' revprop files. Hence only was to somehow recover the revprops. Typical revision prop file looks like this

K 10
svn:author
V 2
pm
K 8
svn:date
V 27
2013-09-05T18:00:22.881511Z
K 7
svn:log
V 1
m END
Typically it contains information about 'author', 'date' and svn commit message. It may also contains merge information. In my case, mergeinfo was not there. Since it was a personal repository only person committing code was me. Hence svn:author will always be me. Now its question of creating message and date. So decided to do take a simpler approach for date use the 1 minute after previous valid revisions date. And message as 'this revision is recovered from corruption' kind of message.

I wrote a  python script, which takes repository path and revision number and does the following
  1. Runs the 'svnadmin verify -r ' and checks 'serialized hash missing terminator error'. 
  2. If the error is reported, the script reads the revision properties of revision just before that (i.e. revno-1) and add One minute to the time stamp of this revision.The log message is changed 'recovered from corruption' message.
  3. Now original corrupted revision property file is copied to a backup location and corrected revision property is written in its place.
  4. The process repeats till get it 'valid revision'. At this point it stops.

After this I ran the 'svnadmin verify' on entire repository and confirmed that all revisions can be read. Then dumped the repository contents to dump file and reloaded the content into a new empty repository.

I checked the log and diff of the revision 1266 and few subsequent revisions. Things seem to be on track.

PS> The python script I wrote is very specific to my needs. However, if you need it, send me a mail.