In a recent post, I created an animation from a series of PNG images that were programmatically generated. Usually after a project like this I back up all the related data for archiving, but this time I wondered if I could somehow reduce the size of the images by taking advantage of temporal redundancy? What do I mean by that? When you look at the frames they're basically all the same and only contain slight changes between adjacent images. Is it possible to take two images, record the difference in a way that can losslessly recreate the original image, and save space?
The method I came up is a crude simplistic implementation of how video encoding works. Take two frames, subtract one from the other and encode the difference. This works because the difference image usually contains very little information. Real video encoding is much more complicated, other frames are used to predict a particular frame, taking into account how objects in the video are moving. The prediction is subtracted from the actual frame to produce a residual image. This is what's encoded, usually in a lossy manner.
I wanted to recreate this, but as this was a quick project, I used common file formats and software that already existed. I didn't want to write a compression algorithm (yuck) I just needed a result. With that in mind I decided to exclusively use the PNG file format, as it's great for compressing images with a large areas the same colour. (I know I could use something like WEBM, MNG, or APNG, but they're poorly supported)
The plan is quite simple, take two images the same size, subtract them from each other channel by channel (including transparency) and save the difference as greyscale PNG files. The one complication is that subtracting an 8 bit number from another 8 bit number gives you a 9 bit solution space, meaning you can't just encode the result in one image. To get around this, the difference images are encoded as two images, the first contains the absolute value of the difference, and the second image contains the sign of the difference (negative is encoded as white). As the image containing the sign only has two colours, one for positive/zero, one for negative, it can be efficiently encoded.
Writing a compression and decompression script in python was easy enough. There are some peculiarities and edge cases in the PNG spec that will cause it to fail. Everything I tested works, but I feel like there may be problems with some palette based images.
As a demonstration I processed two images that aren't similar in any way to illustrate how the system works.
|Difference Compression Example|
The two source images above are split into their RGBA channels and subtracted from each other. You may notice that the RBG difference images have a background that isn't black, (which would imply they are equal) but dark grey, while both images have transparent sections as their background. This is because there are in fact a different colours in the backgrounds of both images, but with the transparency set to full they look the same. The difference image for the alpha channel also looks different, it's just black and white, this is because the images don't use graduated transparency on the alpha channel. It's either on or off, so the difference will be either full or zero.
The 8 images on the right along with the first source image were able to reproduce the second source image. When compared they are found to be pixel for pixel replicas apart from pixels with full alpha. The encoder I used discards the colour information of pixels that are transparent. I'm not going to go into the file sizes and results as this was to illustrate what happens during the process. You'd never compress these images this way.
An actual example
Below is a frame from my animation that is 1080 x 1920 pixels. I'm going to use that as a base to compress the next frame in the animation. I'm not going to show it, apart form the timestamp they look exactly the same to the naked eye.
So, what are the results. The image below contains the file sizes. The original file 1.png was 347 kB. To be fair, I ran it through PNGOUT to see how small it could be made as a standalone image. This resulted in a 248 kB file. In comparrison to the 8 difference and sign images, there's no contest. In total they take up around 30 kB of space. Around an eighth the size of the standalone image, but wait, it gets better.
If you run the sign and difference images through PNGOUT you get a massive size reduction.
That's right, they now come to a total of 13.3 kB, around 5.3% percent of the original file size. The original file 1.png is able to be encoded with only 13.3 kB when using 0.png as a base image. As a side note you probably want to throw them into a tar or zip file so the actual size on disk is reduced.