Can software help a human process images faster than a computer when you take into account the time setting up the image processing pipeline? That's the intent of "SoylentOCR - It's made out of people". It's a python-tkinter based program that can help with small data entry or image classifying tasks.
For the last few years several people have tried to predict the results of the Triple J Hottest 100 song poll by analysing images of votes placed on social media. Their predictions are usually pretty good, but when they do get it wrong it's because of some quirk in the optical character recognition (OCR) process. That's not a criticism, OCR is hard. What's that I hear you say? OCR is a solved problem. Yeah, it kind of is, but it all depends on the quality of the source images, the accuracy required, and how you train the software and post process the results. This is definitely a case where the quality of the source images is less than ideal as demonstrated below.
|4 different sample ballots found online|
In this situation, if you had to process one million images, you'd probably spend time training your software to recognise particular fonts, and writing post processing software to make sure the results are accurate. If you had one image to process you'd probably just type the results in manually. My points is that somewhere between these two extreme cases there is a cross over point. I don't know where that is, but I'd like to explore improving the manual entry option. One of my favourite XKCD comics illustrates this point. Sometimes spending time to make a task more efficient doesn't make sense. It's quicker in the long run to just do it inefficiently.
|I think of this XKCD often. Is it worth the time?|
To improve the process I came up with a program that displays the images in a frame and allows data entry in multiple entry boxes in a second frame. As you type, suggestions for what you are trying to enter appear in the third frame. To move forward and backward though the images to be processed you use tab and shift-tab. Moving between entry boxes is done with the up and down keys. A status box also shows your progress.
When entering data into an entry box, it's split into words at spaces. The first 10 entries containing all of these words are displayed as suggestions. One of these can be selected to fill the entry box by pressing the control key along with the number of the suggestion. You can see in the image below that typing in "el in pa" is enough to make "paper kites the electric indigo" the top suggestion.
|soylentOCR user interface|
When first started, the database contains no suggestions at all, but these are slowly built up when data is entered. Any entry that is not a suggestion becomes ones when you move to the next image. To further improve performance, the suggestions are in the order of the most common ones previously entered. Although it's possible to start with a completely empty database, it helps to "prime" it, by adding a list of suggestions from elsewhere in a database editor. These can be ignored in the final analysis. The sqlite3 database contains one table with columns titled:
FILE_NAME, ATTRIBUTE_NUMBER, ATTRIBUTE
There is a combined primary key over the FILE_NAME and ATTRIBUTE_NUMBER columns. To help my example I found a list of eligible songs, converted it to lower case, replaced any character that wasn't a letter, number, or space with a space (Notepad++ is awesome). I also added approximately 20 songs from a betting website that were predicted to win. This means that these songs are now already in the database twice and appear at the top of the suggestion list. This is done as they are most likely to appear when entering data. A set of training data that complies with the table format was created in a spreadsheet, saved as a CSV file, and imported into the database. It sounds complicated but it isn't.
|Training data imported into SqliteBrowser|
The code isn't perfect. For example the directory containing the images is hard coded into the software. It also treats all files in that directory as images. When it encounters one it can't open it shows a red square and places a notification in the status box.
|Not an image notification|
All that's left now is to sit down and try it out.
To process the 76 images in that directory it took me 1:25:41. That averages about 67 seconds per image. Not as good as I was hoping. Earlier tests of just typing the data in notepad gave results of about 90 seconds per image. So yeah, it's an improvement but not much of one. I did however get a better feel for the problem and have some ideas about how to improve the software.
First of all, 10 suggestions are too many. It turned out to be much easier to type until there were only one or two suggestions and then select one. The other issue was that most of my time was wasted while my eyes were going back and forth between entry and suggestion. My original intent was to have the suggestions appear under the entry boxes, just like Google does when you enter a search term. This means you only need to look in one area. At the time I was a beginner with tkinter and had no idea how to do what I wanted. I think I may know now so I'll give that a try.
To make the results more rigorous this would be better implemented as some sort of web app. Multiple people could log in and be assigned images to process using a shared database. Each image could be processed at least twice by different people and the results compared. If they don't match there's an error
So overall I'm happy with the result. I learnt tkinter, and did achieve a reduction in the time required to enter the data from an image. Unexpectedly it became clear that the program could be used for other purposes. Imagine you were doing some landscaping and you wanted to choose plants for a garden. If you had images of all the plants it would be trivial to go through and rate them 1-10. This is a task that a computer just couldn't do because it's your own personal opinion.
|Get the code!|