Friday, July 26, 2013

OCR Preprocessing with the Radon Transform

At work I deal with a lot of reports that aren't particularly helpful.  All the information I need is in them, but they aren't searchable and I can't filter them.  What I'd really like is to be able to run SQL style queries against the data as if it were database.  This is even harder than it seems as the only way to access the data is via printed reports.  Apart from possibly tapping the LAN connection to the printer there is no way for me to electronically get the data.  So I thought I'd have a go at scanning the documents and using tesseract-ocr on the printed reports.  The results are quite good, but I'd like to do a little preprocessing to improve them.  For my idea to work the documents need to have text parallel to the edge of the scanned image.  This sounds like a job for the Radon Transform, an image processing technique that is ideal for finding long lines in an image.  As I can't show you the actual document because it contains company information, a test document that contains gibberish has been generated with the exact same layout.  The document has been rotated, blurred and had small amounts of noise added to it to simulate a scanned image.  Using this file the features of the Radon Transform can demonstrated and it can be shown how rotation can be detected.

All the processing was done using the open source Matlab equivalent Octave.  I initially looked at OpenCV, but there was way too much to learn to actually get a result.  Octave allows me to quickly test my ideas. The files used in this example can be found here.
Test Document with 0.6 degree rotation and noise added
The Radon transform is a powerful image processing technique that has wide ranging applications.  It's mainly used in situations where you want to detect a long line in an image.  The transform itself is quite easy to conceptually understand.  Imagine placing a series of fine parallel lines over the original image, and going along each line and adding up how much white you see.  After doing this you would end up with a series of intensity values that make up a vertical column of the Radon transform output image.  This process is repeated with the image being turned a set amount each time.  It's similar to taking multiple x-ray images of a subject on a turntable with the subject being rotated slightly between exposures.

Below is the Radon transform of the test image.  Its width is determined by how many times the image is rotated and by how much.  In this case it is rotated from 0 to 180 degrees with 0.2 degree steps.  This gives an image 901 pixels wide.  The height is determined by the maximum size experienced when rotating the image, i.e. the diagonal size of the image.  The original image is 990 x 704 pixels, this makes it 1214.8 pixels diagonally.  The output below is actually 1219 pixels across, this is a little bigger than the diagonal due to the way the transform is calculated.

Radon Transform
Radon Transform of report from 0 to 180 degree with 0.2 degree steps
You can see some detail in the above image, but not much.  The centre of the image represents the short edge of the test page while the edges represent the long edge.  The problem with this image is that it's looking at how much white is in the test image, and no matter what direction you go in there will be a lot of white.  Even text is made of mostly white space.  This makes all values of the output look quite similar.  What we really want to look at is how much black is in the image.  The Radon transform doesn't work like that, but we can get the same effect by inverting the image before putting it through the transform.

Radon Transform
Radon Transform of the Inverted Image
Now we're getting somewhere.  A lot more detail is visible, but there's still one more step I'd like to perform. There's quite a range of values in the image with some bright details, and some very faint ones.  To better explore the image I'm going to apply a simple dynamic compression algorithm to it.  It's important to note that all the values in the output are zero or positive.  By adding 1 and then taking the log of every pixel in the image it will make the faint details more visible.  After rescaling the data for the output image, black details will remain black, and the brightest features will still be the brightest.

Radon Transform
Dynamic Compression applied to the Inverted Radon Transform
That's a lot better.  I explain the details in the output image in a little bit, but first I'll want to show you something I think is really cool.  Remember the first image that I said didn't show much detail?  Well, if you apply something to it called the inverse Radon transform you get back the original image.  You can see in the image below that you get a perfectly good recreation of the original image (with some artefacts).  The quality of the output image depends on the angular resolution of the Radon transform.  Now that's pretty awesome, but what's even better is that this is basically what happens in a CT scanner.  An emitter and linear detector spin around the patient taking multiple exposures.  They are then processed by a similar, but more sophisticated algorithm to produce an image of a slice through the patients body.  The quality of the final image is determined by the resolution of the linear detector and how many exposures are taken around the body.  There's a lot more too it than that, but at the core of it the concept is pretty simple.

Inverse Radon Transform
Inverse Radon Transform of the Radon Transform
Now I'll explain the details in the image.  I've highlighted the test and the transformed images below with corresponding colours to explain the different features visible in the Radon transform.  You may need to click on the images to see more detail.
  1. Green box - This encloses a line of alternating black and white features.  This indicates where the rows of text are.  Where this feature is most prominent will be used to work out how far out of parallel the input text is to the edge of the image.
  2. Red centre line - This line indicates where features from text perfectly parallel to the edges of the image should appear.  In this case the features are 3 pixels to the left of this line.  This is then used to calculate the rotation angle.  3 pixels x 180 degrees / 900 pixels = 0.6 degrees.  This corresponds to how much I rotated the test image.
  3. White circle - This shows the long edge as seen from the short side.  By definition this parallel to itself and is exactly in the centre of the image on the red line.
  4. Yellow Box -  The brightest features on the image are caused by the long unbroken lines near the heading of the report.
  5. Blue Box - These indicate the columns of numerical data on the report
  6. Brown Box - These are the first 2 columns of data that contains a wide column of mainly text
  7. Maroon Box - This shows a column of data that has a decimal point.  The location of the column of decimal points creates a dark spot as there are only a few dots to see in this particular orientation.

Radon Transform
Notable Features of the transform
Notable Features on the Report
I'm still playing around with things and working out a strategy, but what I'd ultimately like to do is extract each line in the report and make it it's own image.  Using Octave under Linux it's simple to automate this behaviour so that all I need to do is drop the image in a folder and then it's automatically processed.

Hopefully you have a better idea of what the Radon transform can be used for.  This has been a lot of fun, I haven't done any image processing in a while and I've forgotten how much I love it.  The challenge of replicating what a human can easily do with their vision system is surprisingly hard and rewarding.

No comments:

Post a Comment