In my last post I demonstrated how to find the angle of rotated printed text using image processing. I also mentioned a couple of ways to make the process more robust. In this post I'll expand on what I meant with a demonstration.
The code and image files associated with this post can be found here.
The basis for this method is that we are only looking for large values in the Radon Transform, this indicates a feature like a line or row of text. We are also looking for bright features in the gradient of the transform. This indicates the transition point between a row of text and the white space below it. By masking out lower values in these two steps, the data points that are more relevant should be highlighted.
The process starts out exactly the same as in my first demonstration, we need to generate the radon transform of the input image.
|Radon Transform from 70 to 110 degrees|
As before, a gradient of the radon transform also needs to be calculated.
|Vertical Image Gradient|
This is where things start to change. A binary mask is generated from the radon transform. This will only show bright points in the transform and remove other points that are most likely false positives. The threshold has been set to 20 percent of the maximum value in the radon transform. A permissive value but it still removes a lot of false positives.
|Thresholded Radon Transform|
The same process is applied to the gradient image using a a threshold of 10 percent of the maximum value of the gradient.
|Thresholded Gradient Image|
The two masks are then applied to the original Radon transform by multiplying the masks with the transform.
|Masked Radon Transform|
I've created the false colour image below to help demonstrate the process a little better. The red channel of the image is the mask from the original radon transform. The Green channel is the mask generated from the gradient image. The Blue channel is the original Radon Transform. The only areas that will be visible in the final masked image are areas where the two masks align. This means that the red and green channel will coincide and create a yellow pixel. This means the only sections visible in the final image are those that are shades of yellow. As the intensity of the blue Radon Transform becomes brighter it will turn the yellow pixel white. (click to enlarge the image)
As before the images are vertically sumed to create an array of intensities. The peak intensity will be the rotation angle of the text. As a comparison, I've show the intensity arrays for the masked and unmasked versions of the process below.
|Gradient Intensity vs Text Rotation Angle - Unmasked|
|Gradient Intensity vs Text Rotation Angle - Masked|
The magnitude of the peaks in the graphs above doesn't matter, what's important is the ratio of the peak to the next largest feature. In the unmasked version the peak is around 74000 and the next largest peak is around 34000, a ratio of about 2.2. In the masked version the peak is around 29000 and the next largest peak is around 7000, a ratio of about 4.1. This make picking the correct feature a lot easier and gives more confidence in the result, which as before comes out at 0.6 degrees.
This is by no means the best that could be done. I picked the threshold values out of thin air. They could be determined by trial and error for a particular type of document or they could be dynamic and adjust to the input image. I'll leave that as an exercise for the reader :-)