Tuesday, May 21, 2013

How a Digital Hoarder Does Backups

After years of telling people to backup their important files, I've finally decided to take my own advice and do backups properly by keeping off-site copies.  For quite some time now I've kept duplicate backups of my file archives at home.  The files don't contain the most important data in the world, but they're important to me, and the ever increasing density of storage means that they take practically no space.  The files are mostly old assignments and projects I've worked on since about 1997 and I'd like to keep them just in case.  Yeah, I know, I'm a digital hoarder.

While I'm adding off-site back-ups to my storage process, I've also taken the opportunity to add an extra layer to protect against file corruption.  I've created a file that contains an MD5 checksum of every file on the drive.  Although MD5 isn't cryptographically secure, it's enough to detect a corrupted file while being considerably faster than SHA1 to generate and check.  Generating the MD5 checksum file is easy.  I just navigated to the root directory of the external drive in Linux and ran the following command.

find ./ -type f -exec md5sum {} + > Checksums.MD5

The generated file Checksums.MD5 contains the checksum of every file on the drive and can be later used to check the integrity of each file with the next command.

md5sum -c --quiet Checksums.MD5

The checksum files generated will fail validation as their checksums are generated before they are completed, but every other file on the drive should quietly pass validation.

So how does this help to maintain file integrity?  Every couple of months both drives need to be checked to make sure that they aren't corrupted.  If corruption is detected, a new backup needs to be made from the working backup to replace the failing drive.  It's unlikely that there will be a failure of two drives at the same time (not impossible).

I also intend to perform another check at this time.  As the backup drives are kept off-site, the important files are encrypted with 7zip.  I felt that it's a stable and secure program that my family would be able to use if the need arose.  However it's important to guard against format rot.  If it turns out that 7zip is no longer maintained and it falls into disrepair or obscurity I have a chance to re-encrypt my files using a different program.

Now I know what most of you are saying, use the cloud.  Well, if you can tell me of an affordable on-line backup service with 600GB of encrypted storage go for it.  Not to mention how long that would take to upload.  I also know that my system isn't perfect, but it's a lot better than what I used to use.  Nothing is foolproof, it's all about minimizing risk.  By keeping a checksum of my files I've minimized the risk of them being corrupted, and by keeping off-site backups I've protected myself again localized damage, i.e. house fire or theft.  Both locations are in areas with a low risk of flooding about 5 km apart, so if there's a disaster that destroys both copies, it's likely that I won't be around either.


  1. 600GB? You got bitcasa, it costs $10 per month and will get you the space you need. As a digital hoarder myself, i just love it! I also use crashplan as well, just in case, i like to keep multiple backups :)


Note: Only a member of this blog may post a comment.