Post AW4rAIUm0pi7voIjXU by SebasTEAan@emacs.ch
(DIR) More posts by SebasTEAan@emacs.ch
(DIR) Post #AW4rAIUm0pi7voIjXU by SebasTEAan@emacs.ch
2023-05-27T08:41:19Z
0 likes, 1 repeats
I started developing a little package for Tesseract OCR integration in Emacs. Any feedback / contribution is highly appreciated.https://github.com/SebastianMeisel/tesseract.el #emacs #emacslisp #ocr #tesseract
(DIR) Post #AW4uRCh98CQKj8qFMm by louis@emacs.ch
2023-05-27T11:01:10Z
0 likes, 0 repeats
@SebasTEAan Great idea! I like that you used Orgmode. This way reading the source is like reading a good documentation at the same time. 🚀​I was looking for something like that to get my filing under control. I literally have thousands of scanned PDFs and since I left Dropbox years ago I have trouble finding stuff.
(DIR) Post #AW5fnG149vMxFDB1hw by SebasTEAan@emacs.ch
2023-05-27T19:51:49Z
0 likes, 0 repeats
@louis Thank you. The same problem as yours was my motivation. I just added support to run Tesseract on multiple images from Dired. I hope to add support to add a text layer to existing PDFs soon, so they become searchable.
(DIR) Post #AW5v8USX9bqdQrbiRU by xenodium@indieweb.social
2023-05-27T22:43:41Z
0 likes, 0 repeats
@louis @SebasTEAan if the scanned pdfs were already ocrd, https://github.com/phiresky/ripgrep-all is pretty handy for searching