Found a nice open-source OCR package today: Tesseract. I had some problems, but was able to resolve them pretty quickly with the help of strace. Here is the README.cygwin file I submitted to the documentation page. I'm converting City of the Sun documents to text as a start to collaborative editing, possibly using Google Documents. The first step isn't editable, but relevant: the Articles of Incorporation. There may still be some errors in the text, but it was about 95% correct after running Tesseract. Excellent software.

Back to blog or home page

last updated 2013-01-10 20:52:13. served from tektonic.jcomeau.com