Subj : Re: King/Gassner/Shanahan To : comp.programming,comp.lang.java.programmer,comp.lang.lisp From : Ben Pfaff Date : Sun Aug 28 2005 06:10 pm "Daniel Dyer" writes: > On Sun, 28 Aug 2005 22:18:14 +0100, Robert Maas, see > http://tinyurl.com/uh3t wrote: >> >> Does anybody know of a free program that does OCR on PDF files and >> keeps track of layout so as to convert to reasonable HTML or plain >> ASCII? > > There are various command-line utilities. Search for "pdf2ascii", > "pdf2html", "pdftohtml", "pdf2txt" etc. Maybe your shell account > already has one of these available. To my knowledge, none of these utilities do any form of OCR. They only work with PDF files that render text characters with a font. They will not produce useful results for PDF files that just encapsulate a scanned bitmap. -- "But hey, the fact that I have better taste than anybody else in the universe is just something I have to live with. It's not easy being me." --Linus Torvalds .