Sunday, October 30, 2016

Create Compressed Encrypted Backups Only When Files Change

Most of the files that I back up aren't really that important, but some of them contain personal information that I'd like to keep private.  My usual strategy is to sync things to Google Drive, but I do so on the assumption that one day a data breach will make everything visible.  So I needed a way to encrypt some items before backing them up.  Writing a PowerShell script seemed the best way to accomplish this.

In my last post I described a method to generate hashes for files and directories.  My intent for this is to be able to tell if they have changed and need their backups replaced.  Using this as a starting point I was able to create a script to compress and encrypt items ready to sync them to Google Drive.  It's not too complicated but needs some explanation.

A naming strategy for the backups was needed and the solution that seemed to fit best was to use the following format.

YYYYMMDD_XXXXXXXX_<Orginal Item Name>.tar.gpg

YYYYMMDD - represents the date the backup was created
XXXXXXXX - are the last 8 hex characters of the item hash of the backed up data (8 is enough, I didn't want to make the filenames too long.)
<Orginal Item Name > - is the original name of the file or directory
.tar.gpg - denotes that the that the file is an encrypted archive

For example, something like test.txt may become 20161029_A4F88BC1_test.txt.tar.gpg

The backup process is as follows.  Each file or directory in the specified input directory is processed by first calculating its hash.  The output directory is searched to see if a backup of the data already exists.  A valid match is found if the fingerprint and original name in the filename matches the data that is being considered for backup.  If so, the process doesn't need to continue as there is already a backup.  To change an encrypted backup for no reason means it would have to be re-uploaded.

If the data has changed since the last backup, or no backup exists, a new one is created.  The file or directory is added to a tar archive, compressed, and encrypted.  This is all done in a temporary folder created inside the system temp directory.  If the encryption process succeeds, the old backups are removed and replaced by the new one.

Back up output
Command line output of script
.
Backed up Files
Encrypted files in the the Windows explorer
I was determined to make sure that the script supports Unicode file names, but unfortunately gpg can't handle files with unicode characters in the name.  To get around this the file is redirected into and out of the command so that gpg only deals with the data.  This causes a problem though.  If the encryption step fails, the output file is still created but 0 bytes are redirected to it.  To make sure this isn't a problem the program checks to see in the gpg command completed successfully before replacing the backup. 

gpg encryption command
How to encrypt files with Unicode filenames
I really like encrypting back ups with public key cryptography.  There are no passwords to accidentally leave in scripts that can lead to security problems.

 Get The Code!
Get The Code

No comments:

Post a Comment