fsebugoutzone.org:9999

Post AW4rAIUm0pi7voIjXU by SebasTEAan@emacs.ch
(DIR) More posts by SebasTEAan@emacs.ch
(DIR) Post #AW4rAIUm0pi7voIjXU by SebasTEAan@emacs.ch
2023-05-27T08:41:19Z

0 likes, 1 repeats

I started developing a little package for Tesseract OCR integration in Emacs. Any feedback / contribution is highly appreciated.https://github.com/SebastianMeisel/tesseract.el #emacs #emacslisp #ocr #tesseract

(DIR) Post #AW4uRCh98CQKj8qFMm by louis@emacs.ch
2023-05-27T11:01:10Z

0 likes, 0 repeats

@SebasTEAan Great idea! I like that you used Orgmode. This way reading the source is like reading a good documentation at the same time. 🚀I was looking for something like that to get my filing under control. I literally have thousands of scanned PDFs and since I left Dropbox years ago I have trouble finding stuff.

(DIR) Post #AW5fnG149vMxFDB1hw by SebasTEAan@emacs.ch
2023-05-27T19:51:49Z

0 likes, 0 repeats

@louis Thank you. The same problem as yours was my motivation. I just added support to run Tesseract on multiple images from Dired. I hope to add support to add a text layer to existing PDFs soon, so they become searchable.

(DIR) Post #AW5v8USX9bqdQrbiRU by xenodium@indieweb.social
2023-05-27T22:43:41Z

0 likes, 0 repeats

@louis @SebasTEAan if the scanned pdfs were already ocrd, https://github.com/phiresky/ripgrep-all is pretty handy for searching