Saturday, October 15, 2016

Generating Directory Hashes

In my ongoing efforts to back up my files, I have a directory structure that I want to archive, but only if it has changed.  The reason for this is that the back ups are compressed and every time they get changed the whole archive needs to be re-uploaded.  Unfortunately the internet in Australia makes this a daunting task.  So I need a way to know if a directory has changed.  There are a couple ways I could do this, I could use time stamps to see if files or directories have been modified, or I could look at file contents.  I decided to look at the file contents, and basically create a "hash" for files and directories.  This would allow me to compare values over time.

Most people reading this would be familiar with taking the MD5 hash of a file and what that means.  It gives you a fingerprint of the file contents, and if any of the contents change, the hash value changes dramatically.  That's great, but it's only for files and it completely ignores the file name.  To me, a directory structure has changed if even a file name has changed.  I explored using something like an archiving format like tar to bundle up the directory structure with file contents and then taking a hash, but there's no guarantee that one implementation of tar will give exactly the same results as another, i.e. it's not deterministic.  This would give different hash values and is useless.

To overcome these problems I came up with something that I think is reasonably simple that only takes into account changes in directory structure, file and directory names, and file contents when determining if something has changed.

  • A directory structure can have files and sub-directories.
  • A file hash is equal to MD5(MD5(file contents) XOR MD5(UTF-8 byte array of name))
  • A directory hash is equal to MD5((directory contents) XOR MD5(UTF-8 byte array of name))
  • The content of a directory is equal to the XOR of the hashes of all files and directories it contains in the level below it.

Let me just state now that I know MD5 isn't secure, this isn't a security thing, I just need a fast way to get a file checksum.

So with these basic rules I wrote a PowerShell script so that we can take the hash of files and directories to a create a fingerprint so that they can be compared to future versions.  In the test below I created a directory structure with some test files to play around with.  Some directories are junction points and symlinks.  Some files are hardlinks and symlinks.  The script can be configured to ignore junction points and symlinks, not hardlinks as these are indistinguishable from other files.  I also threw in some unicode file and directory just to make sure every thing works as expected.

In the image below, each item in the directory structure has its own box with 4 different hexadecimal strings.  The red string describes the content hash.  If it's a file, that's just the normal MD5 hash of the file, if it's a directory it's the combined XOR of the all files and directories in the level below it.  The green string is the MD5 hash of the byte array of the name of the file or directory.  The blue string is the XOR of the content hash and the name hash.  The black string with green highlighting is the MD5 hash of the XOR result. (I'll get back to why this is done later)

directory structure
Calculating a directory hash
The implementation isn't too hard.  First create a function that calculates our version of a file hash.  Then create a function that can create a directory hash that calculates the hash of all items in it, along with the other operations needed to create the directory hash.  By recursion this will then explore the directory tree.

It may seem excessive to perform a hash on the result of the XOR value, but in the scenario below I'll show how you can get the same hash for two different directory contents if you don't do it.

File Hashes
No final hash can lead to different directories with equal hashes
You can see in the image above that if you swap the content of two files and don't do a final hash you can end up in a situation where they can give equal content hashes and if they happen to be in directories with the same name, those directories will have the same hash value. Hashing the values of the XOR prevents this as can be seen below.

File hashes
Adding a final hash leads to directories with different hashes
You can now differentiate between the directories as they have different hashes.  The final MD5 operation basically "scrambles" the information of the content and name hash before it can propagate to the level above.  Without it, and because of the associative and commutative properties of the XOR function you can end up with equal XOR results.

Get the code!
As with a lot of my projects they're a little rough.  I'd love for someone to take the ball and run with it to create a more professional version.  I think I've provided enough information to get people started.

Tuesday, October 4, 2016

Back up Git repositories to Google Drive

I'm trying to come up with a decent backup strategy and I'm almost there.  Figuring out a way to back up git repositories was a little confusing though.  I use GitHub to host repositories that I'm working on locally, and that's an OKish backup, but I don't check every file into Git.  For example, if I'm working on an electronics design I don't really want the manual for the micro-controller to be tracked by version control, but I do want a backup of the manual just in case they change it for some reason.  So for files like this I keep them with all the others and add them to the .gitignore file.  This is great, but they're not backed up anywhere.

Normally I use Google Drive for my backups.  There are other services that are probably better and have desktop syncing apps that are more polished, but I can easily access files from any device and I trust Google not to go broke in 6 months.  So a simple solution to my problem might be to store local repositories in the Google Drive directory.  That may work, but I just don't trust Git and the Google Drive app to get along together.  So what I ended up doing was just copying a backup of the repository to Google Drive.

This works, but if you copy files to the Google Drive directory and overwrite the old versions it wants to re-upload everything even if the files are unchanged.  You could do a copy where only newer files are overwritten but then another problem arises, files that are deleted from the repository remain in the backup taking up space.  That might be ideal in some situations but I want this basically to be a mirror of the current state of the local repository folder.  In reality what I want is a one way sync to the backup location.  Luckily the robocopy command can manage this.

cmd /k robocopy "Repositories To Backup" "Backup Location" /e /purge

By placing the above command in a batch file, anything new in the "Repositories To Backup" directory will be copied to the "Backup Location".  Don't worry about the cmd /k part, it just lets the command window stay open after it runs robocopy.  By default robocopy copies a file if it's changed in any way.  If unchanged, it will just skip the file.  This will prevent Drive from wanting to upload the file again.  The /e option means it will also copy empty subdirectories and the /purge option means that it will delete files from the backup location that don't appear in the source directory.  This keeps the backup location synced to the source location.

I keep all my git repositories in a Projects folder, so I just set the "repositories to backup" to the this folder, so that when I run the batch file it backs up all the repositories at once.  I run the batch script it manually, but you could schedule it to run automatically too.  I know it's not the best solution, but it works for me.

Friday, September 23, 2016

Calculating Dihedral Angles

Just a quick one today.  Up until recently I had never heard of the term dihedral angle.  It sounds complicated but it's something you already know.  If you have two surfaces in spaces that meet, they will form a line. The angle between the two surfaces at this intersection is called the dihedral angle.

I had recently been wondering about how to calculate the angle between two surfaces and decided to work through the problem myself.  First of all you need to describe the surfaces and I thought that the best way to do this was by supplying vectors normal to each surface, v1 and v2.
Angle between two surfaces

By simplifying the geometry and extending the normal vectors until they intersect you can start to see how to solve the problem.
Extended surface normal vectors

You can also derive the following formula from the geometry.
Relation between alpha and beta

From this you can see that the cosine of beta is equal to the negative cosine of alpha.
Simplified cosine term

By taking the dot product of the vectors v1 and v2 you can quickly derive an expression for the angle alpha.
Expression for alpha

Great!  Problem solved.  Or is it?  The inverse cosine function only returns values between 0 and 180. This means you will never get an answer greater than 180 degrees, meaning you can't have convex angles.  The image below demonstrates why.  Given the vectors v1 and v2 to define the surfaces in the first part of the image below you would quickly run into trouble.  You can see in the lower part of the image that the same two vectors can be used to define a different geometry with a concave angle.
Ambiguity between vectors

As the equation for alpha above doesn't change if the vectors v1 and v2 are swapped the two geometries above will give the same angle.  There are ways to work around this problem, you just need to be aware of the geometry you're working with.  I'm still having trouble getting my head around this.  The order the angles are given is important and would allow you to work out if the angle is concave or convex, but when generalising this to 3 dimensions you have trouble depending on what angle you look at joint.  I'm falling asleep typing this with pictures of vectors dancing around my head.  Maybe a fresh set of eyes will help.

Monday, September 12, 2016

Compound Mitre Cuts

It's funny how the seemingly easy can be ridiculously hard.  A relative wanted a replica of a serving tray that belonged to a family member, and I was asked to work out the mitre angles because I'm the math guy.  "Sure, no problem" I naively said.  All that was required was to tilt the tops of all the sides outward so that the timber was 25 degrees from vertical.

Mitred Wood
Angled Frame
Before going too far it seemed prudent to cut a test using some scrap wood.  The first step was to cut a 25 degree taper on the edge of the moulding that's in contact with the base.  That was easy.  I then came unstuck trying to calculate the angles required to make the corners fit together at 90 degrees.  I ended up using an awesome online calculator to calculate the angles for convenience.  It may seem easy, but in this case the mitre angle needed to be cut at about 22.9 degrees, while at the same time the head of the saw needs to be tilted at 39.9 degrees.  It took me a while to figure out where those numbers come from, but suffice to say it's not the simplest of maths.

Mitred Wood
Undercut Mitre
The moulding wasn't to easy to work with as it doesn't have many flat surfaces.

Mitred Wood
Undercut Mitre
Mitred Wood
The corners don't fit together that well as I didn't take extremely accurate measurements.  It also doesn't help that the parallel edges aren't exactly the same length.

Mitred Wood
Frame Corner
One problem I currently have is that I don't have a good way to measure angles.  That should be solved soon.  Combining the results so far with a little more accuracy should make the job a lot easier.

Mitred Wood
Mitre Corner
In the image below you can see the complex shape of the wood I'm using for the test.  The actual serving tray will be made out of plain rectangular material making things a lot easier.

Mitred Wood
Profile Shape
I do have a method to calculate all the angles needed to make compound mitre cuts.  It makes use of vectors and exploits some of their abilities, but I'll leave that for another time.

Thursday, September 1, 2016

Custom Storage Box Prototypes

In my last post I played around with an idea for mass producible boxes that can be manufactured with laser cutting and routing.  Before going too far down this path I wanted to test the form factor of the box, discovering what works and what doesn't.  Besides, I actually need some storage.

I started with a cheap 1.2 meter long piece of pine from Bunnings.  The external dimensions of the box (350mm x 275mm) were selected to minimise waste from a single board.  The timber is 184 mm wide and is close enough to the desired final height of the boxes, it therefore didn't make sense to change it.  Ideally I think 10 mm plywood is optimum for the box sides, but 19 mm pine will be fine for this test.  As the sides are thicker it also allows them to be joined with a dowels instead of cutting time consuming box joints.

Wooden Box
Box With Sliding Lid
The construction is very simple, with two side panels placed between a front and rear panel.  All contain a rebate at the bottom to retain a piece of 7mm ply for the base.  The front and sides are also rebated at the top to hold a 7mm ply sliding lid.  The back panel is cut lower to allow the insertion and removal of the lid.  In the final design the groove for the sliding lid would also continue into the back panel, but it's quicker and easier to just cut it straight across.  Remember, the point of these boxes isn't for demonstrating my jointing prowess (still a beginner, but getting better) it's to quickly produce a box to use and test.  You'll also notice I wasn't too concerned about router tearout either.

Wooden Box
Rear of Box with Smaller Back Panel
When inserted, the lid prevents dust entering the box and items falling out.  However, in this rudimentary test the main problem is that if the box is picked up from the front the lid slides out the back.  It definitely needs some sort of retention mechanism and it also needs a handle.  Both of these flaws were obvious from the start, but I haven't really settled on how I want that to work.

Wooden Box
Top of Box
This is my favourite part of the design.  They're brass drawer pulls I bought on Aliexpress for about one Australian dollar each.  They allow small labels to be inserted and removed as needed.  The viewable area of the labels they hold are about 40mm by 20mm and you can fit a decent amount of text in there, but it's not that readable at a distance.  Unfortunately the proportions of the front panel look weird too.  As these boxes are designed to fit a specific location I had to make the long side the front, ideally the short side would be the front and these drawer pulls would be more suitable.  Luckily there are larger ones available, so next time I'll buy those.

Wooden Box
Brass Pull with Label Insert
I also decided to experiment with interlocking removable dividers.  There theory was to split the box up into 9 equal compartments to allow better separation of small items.  Due to tool and jig issues, I had to make the centre compartment on the long side larger than the others.

Wooden Box
Front to Back Dividers
With all the dividers in place you may notice a lot of space around some of the joints.  I was allowing a lot of clearance in this part of the job as I wanted everything to go together easily the first time.

Wooden Box
Side to Side Dividers
The big problem I discovered is that there's a minimum usable compartment size.  I can get my hand in the smaller ones but I can't see what I'm trying to grab.  If the box was only 25 mm tall that wouldn't be a problem, but as each compartment is about 150 mm deep it isn't going to work.  I think if the box was divided into 6 (3 x 2) roughly equal square compartments the result would be more usable.

Wooden Box
Compartments are too small
I cut the divider guides all the way across the board but they only need to extend to the top edge.

Wooden Box
Guide at Top of Box For Lid
I think this next detail is the most important thing I learnt that'll help make a more enjoyable design.  When the lid is inserted it's in contact with all four side panels and any misalignment during assembly or even expansion and contraction of the wood will cause the lid to catch on edges when inserted.  You can see in the image below that when inserting the panel in the back slot there's a small lip due to assembly misalignment.

Wooden Box
Misalignment of Back and Side Panels
This can also occur at the front.  To fix this, ideally the rebated guides in the sides should be tapered.  Let's say you have a 7 mm board for the lid and you're happy leaving 0.5 mm of clearance.  Instead of just routing a 7.5 mm slot, make it 8.5 mm at the back to allow for misalignment and then reduce it to 7.5 mm after an inch or two.  Then make the rebate in the front 8.5 mm wide.  By doing this the lid should slide in and out without hitting anything.

Wooden Box
Misalignment of Front and Side Panels
The top lid won't be perfectly flat either, it could bow up or down.  So you may want to make the front slot a little larger in the centre where it will bow the most.  You could also taper the sides of the front rebate to allow the lid to locate into the front slot smoothly.

Wooden Box
Curved Lid Not fitting into Front Slot
Dimensionwise, I was flying by the seat of my pants on this project and just making it up as I went.  Most of my rebates were 6 mm but that introduces a tiny problem.  You want the rebate for the dividers to be a small as possible.  If they're too deep, you leave a small hole that dust or insects can get into.  Just a minor thing.

Wooden Box
Divider Guide as Seen From the Top
While I was at it I made 8 of them. :-)  It took about three days of work and the the total price for each box comes to around $20.  The funny thing is that making 8 boxes takes about as long as making 6 boxes.  Most of the time was spent setting up tools, measuring and marking things.  That's partly why I'd like to use CNC equipment to cut out the parts for the boxes.  In the time it took me to make 8 boxes I could assemble 50 pre-cut boxes.

Wooden Box
Usage Example