[HN Gopher] How to do OCR on a Mac using the CLI or just Python
___________________________________________________________________
How to do OCR on a Mac using the CLI or just Python
Author : gregsadetsky
Score : 142 points
Date : 2024-01-02 18:20 UTC (4 hours ago)
(HTM) web link (blog.greg.technology)
(TXT) w3m dump (blog.greg.technology)
| eigenvalue wrote:
| Weird, I couldn't get it to work on a bunch of different files,
| even using very simple file names. Kept getting this error:
|
| Error: The operation couldn't be completed.
| (WFBackgroundShortcutRunnerErrorDomain error 1.)
| Oras wrote:
| I suppose you haven't renamed the new shortcut to `ocr-text`
| TimeBearingDown wrote:
| Very cool, and seems handy!
|
| I've always had good results from the Preview.app. I wonder how
| this engine compares for number of errors in a difficult source
| versus Free alternatives.
| zavertnik wrote:
| Nice post, OP! I was super impressed with the Apple's vision
| framework. I used it on a personal project involving the OCRing
| of tens of thousands of spreadsheet screenshots and ingesting
| them into a postgres database. I tried other OCR CPU methods
| (since macOS and Nvidia still don't play nice together) such as
| Tesseract but found the output to be incorrect too often. The
| vision framework was not only the highest quality output I had
| seen, but it also used the least amount of compute. It was fairly
| unstable, but I can chalk that up to user error w/ my
| implementation.
|
| I used a combination of RHetTbull's vision.py (for the actual
| implementation) [1] + ocrmac (for experimentation) [2] and was
| pleasantly surprised by the performance on my i7 6700k
| hackintosh.
|
| I wouldn't call myself a programmer but I can generally
| troubleshoot anything if given enough time, but it did cost time.
|
| [1]:
| https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...
|
| [2]: https://github.com/straussmaximilian/ocrmac
| srott wrote:
| you can use clipboard with pbpaste/pbcopy commands
|
| ocr-text "$1" && pbpaste
| llimllib wrote:
| It also outputs to the command line if you pipe it to cat
| shortcuts run ocr-text -i new-haven-pizza.jpg | cat
| HelloImSteven wrote:
| I'll throw my solution into the mix:
| https://skaplanofficial.github.io/PyXA/tutorial/images.html#...
|
| PyXA uses the Vision framework to extract text from one or more
| images at a time. It's only a small part of the package, so it
| might be overkill for a one-off operation, but it's an option.
| wahnfrieden wrote:
| fyi you're using the old and less accurate api,
| VNRecognizeTextRequest
|
| ImageAnalyzer is newer and much better
|
| I bet this shortcut from OP is also using the older API under
| the hood
| HelloImSteven wrote:
| ImageAnalyzer is Swift-only and has no corresponding
| Objective-C method, so it's not available in PyObJC. I can
| look into bridging it at some point.
| gist wrote:
| To place contents in a file (not claiming this is the most
| efficient way but it works)
|
| OCRTHISFILE="ocr-test.jpg"
|
| shortcuts run ocr-text -i "${OCRTHISFILE}"
|
| pbpaste > ${OCRTHISFILE}.txt
|
| or to view output and place in file:
|
| OCRTHISFILE="ocr-test.jpg"
|
| shortcuts run ocr-text -i "${OCRTHISFILE}"
|
| pbpaste | tee ${OCRTHISFILE}.txt
| msxbel wrote:
| Or use MacOS shortcuts to output ocr text as file (Action:
| "Append to Text File")
| mushufasa wrote:
| Very cool. Anyone know how this compares to AWS Textract in
| general? Does the Apple Vision framework support table
| recognition?
| llimllib wrote:
| It looks like it does, but you need to handle it at a pretty
| low level, this shortcut won't get you there:
| https://developer.apple.com/videos/play/wwdc2019/234?time=19...
| novagameco wrote:
| On Windows I recommend text extractor from powertoys:
|
| https://learn.microsoft.com/en-us/windows/powertoys/text-ext...
| rikafurude21 wrote:
| Are ios and macos shortcuts crosscompatible? I didnt know there
| was shortcuts for the mac, seems pretty powerful to be able to
| run them from the terminal too. Thanks OP
| diegof79 wrote:
| Yes they are compatible as long you use actions available on
| both platforms. For example, you can use AppleScript or shell
| in macOS but it will not work on iOS. However, if you use cross
| platform apps shortcuts it works even when you write files into
| the iCloud folder. For example, I did a shortcut that takes
| today's events from the Calendar and appends the list into a
| Markdown file in a Obsidian vault on iCloud. I use it to
| scaffold meeting notes, and it works on my phone too.
| tough wrote:
| I'm a huge fan of this little ocr tool isntalled through brew
| onto my macbook https://github.com/schappim/macOCR
| nemosaltat wrote:
| Same, and for my purposes, I just wrap that utility in a macOS
| Shortcut I can click from my menu bar, or launch from
| Quicksilver.
| bogeholm wrote:
| Quicksilver, now there's a blast from the past! I don't think
| I've installed it on any Mac in the past 5 years, but I used
| to love it.
|
| What are the advantages over native macOS shortcuts these
| days?
| BoppreH wrote:
| I tried doing something similar on Windows, and realized that
| PowerToys[1], a Microsoft project I already had installed,
| actually contains a very good OCR tool[2]. Just press Win+Shift+T
| and select the area to scan, and the text will be copied to the
| clipboard.
|
| [1] https://learn.microsoft.com/en-us/windows/powertoys/
|
| [2] https://learn.microsoft.com/en-us/windows/powertoys/text-
| ext...
| mywacaday wrote:
| I use autohotkey + powertoys to append screenshot data to a
| CSV, works great with it's own key mapping
| minimaxir wrote:
| Surprisingly, the Extract Text from Image action is available on
| Intel Macs: normally, features like automatic-image-OCR is
| limited to Apple Silicon Macs.
| predictsoft wrote:
| On Windows, A9T9 does a great job of OCR'ing scanned JPEG files
| (and any JPEG file). It's also free.
|
| I scanned about 100 A4 documents in just a couple of minutes.
| geniium wrote:
| Have u guy tried ChatGpt or other alternative?
| justinl33 wrote:
| Awesome! Is there a similar technique for the Apple vision '
| _Copy Subject_ ' feature? I've become extremely reliant on it,
| but it feels very limited in access.
| pimlottc wrote:
| I had to Google this, do you mean the feature in Photos on
| mobile where you can "extract" items from a picture and make
| them into stickers? Apple seems to call it "lifting subjects"
| [0] [1].
|
| 0: https://support.apple.com/guide/iphone/lift-a-subject-
| from-t...
|
| 1: https://developer.apple.com/videos/play/wwdc2023/10176/
|
| EDIT: Try replacing the "Extract text" action with "Remove
| background". When running the shortcut, use "-o" to specify
| output image filename. shortcuts run remove-
| background -i ~/Downloads/portrait-beard.avif -o beard.jpg
| cyberax wrote:
| It doesn't work for Chinese characters :(
| andreasley wrote:
| macOS Ventura and newer actually have basic OCR functionality
| integrated into the Image Capture UI. When using an AirPrint-
| compatible scanner and scanning to PDF, the checkbox "OCR" is
| shown in the right pane.
___________________________________________________________________
(page generated 2024-01-02 23:00 UTC)