[HN Gopher] How to do OCR on a Mac using the CLI or just Python
       ___________________________________________________________________
        
       How to do OCR on a Mac using the CLI or just Python
        
       Author : gregsadetsky
       Score  : 142 points
       Date   : 2024-01-02 18:20 UTC (4 hours ago)
        
 (HTM) web link (blog.greg.technology)
 (TXT) w3m dump (blog.greg.technology)
        
       | eigenvalue wrote:
       | Weird, I couldn't get it to work on a bunch of different files,
       | even using very simple file names. Kept getting this error:
       | 
       | Error: The operation couldn't be completed.
       | (WFBackgroundShortcutRunnerErrorDomain error 1.)
        
         | Oras wrote:
         | I suppose you haven't renamed the new shortcut to `ocr-text`
        
       | TimeBearingDown wrote:
       | Very cool, and seems handy!
       | 
       | I've always had good results from the Preview.app. I wonder how
       | this engine compares for number of errors in a difficult source
       | versus Free alternatives.
        
       | zavertnik wrote:
       | Nice post, OP! I was super impressed with the Apple's vision
       | framework. I used it on a personal project involving the OCRing
       | of tens of thousands of spreadsheet screenshots and ingesting
       | them into a postgres database. I tried other OCR CPU methods
       | (since macOS and Nvidia still don't play nice together) such as
       | Tesseract but found the output to be incorrect too often. The
       | vision framework was not only the highest quality output I had
       | seen, but it also used the least amount of compute. It was fairly
       | unstable, but I can chalk that up to user error w/ my
       | implementation.
       | 
       | I used a combination of RHetTbull's vision.py (for the actual
       | implementation) [1] + ocrmac (for experimentation) [2] and was
       | pleasantly surprised by the performance on my i7 6700k
       | hackintosh.
       | 
       | I wouldn't call myself a programmer but I can generally
       | troubleshoot anything if given enough time, but it did cost time.
       | 
       | [1]:
       | https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...
       | 
       | [2]: https://github.com/straussmaximilian/ocrmac
        
       | srott wrote:
       | you can use clipboard with pbpaste/pbcopy commands
       | 
       | ocr-text "$1" && pbpaste
        
         | llimllib wrote:
         | It also outputs to the command line if you pipe it to cat
         | shortcuts run ocr-text -i new-haven-pizza.jpg | cat
        
       | HelloImSteven wrote:
       | I'll throw my solution into the mix:
       | https://skaplanofficial.github.io/PyXA/tutorial/images.html#...
       | 
       | PyXA uses the Vision framework to extract text from one or more
       | images at a time. It's only a small part of the package, so it
       | might be overkill for a one-off operation, but it's an option.
        
         | wahnfrieden wrote:
         | fyi you're using the old and less accurate api,
         | VNRecognizeTextRequest
         | 
         | ImageAnalyzer is newer and much better
         | 
         | I bet this shortcut from OP is also using the older API under
         | the hood
        
           | HelloImSteven wrote:
           | ImageAnalyzer is Swift-only and has no corresponding
           | Objective-C method, so it's not available in PyObJC. I can
           | look into bridging it at some point.
        
       | gist wrote:
       | To place contents in a file (not claiming this is the most
       | efficient way but it works)
       | 
       | OCRTHISFILE="ocr-test.jpg"
       | 
       | shortcuts run ocr-text -i "${OCRTHISFILE}"
       | 
       | pbpaste > ${OCRTHISFILE}.txt
       | 
       | or to view output and place in file:
       | 
       | OCRTHISFILE="ocr-test.jpg"
       | 
       | shortcuts run ocr-text -i "${OCRTHISFILE}"
       | 
       | pbpaste | tee ${OCRTHISFILE}.txt
        
         | msxbel wrote:
         | Or use MacOS shortcuts to output ocr text as file (Action:
         | "Append to Text File")
        
       | mushufasa wrote:
       | Very cool. Anyone know how this compares to AWS Textract in
       | general? Does the Apple Vision framework support table
       | recognition?
        
         | llimllib wrote:
         | It looks like it does, but you need to handle it at a pretty
         | low level, this shortcut won't get you there:
         | https://developer.apple.com/videos/play/wwdc2019/234?time=19...
        
       | novagameco wrote:
       | On Windows I recommend text extractor from powertoys:
       | 
       | https://learn.microsoft.com/en-us/windows/powertoys/text-ext...
        
       | rikafurude21 wrote:
       | Are ios and macos shortcuts crosscompatible? I didnt know there
       | was shortcuts for the mac, seems pretty powerful to be able to
       | run them from the terminal too. Thanks OP
        
         | diegof79 wrote:
         | Yes they are compatible as long you use actions available on
         | both platforms. For example, you can use AppleScript or shell
         | in macOS but it will not work on iOS. However, if you use cross
         | platform apps shortcuts it works even when you write files into
         | the iCloud folder. For example, I did a shortcut that takes
         | today's events from the Calendar and appends the list into a
         | Markdown file in a Obsidian vault on iCloud. I use it to
         | scaffold meeting notes, and it works on my phone too.
        
       | tough wrote:
       | I'm a huge fan of this little ocr tool isntalled through brew
       | onto my macbook https://github.com/schappim/macOCR
        
         | nemosaltat wrote:
         | Same, and for my purposes, I just wrap that utility in a macOS
         | Shortcut I can click from my menu bar, or launch from
         | Quicksilver.
        
           | bogeholm wrote:
           | Quicksilver, now there's a blast from the past! I don't think
           | I've installed it on any Mac in the past 5 years, but I used
           | to love it.
           | 
           | What are the advantages over native macOS shortcuts these
           | days?
        
       | BoppreH wrote:
       | I tried doing something similar on Windows, and realized that
       | PowerToys[1], a Microsoft project I already had installed,
       | actually contains a very good OCR tool[2]. Just press Win+Shift+T
       | and select the area to scan, and the text will be copied to the
       | clipboard.
       | 
       | [1] https://learn.microsoft.com/en-us/windows/powertoys/
       | 
       | [2] https://learn.microsoft.com/en-us/windows/powertoys/text-
       | ext...
        
         | mywacaday wrote:
         | I use autohotkey + powertoys to append screenshot data to a
         | CSV, works great with it's own key mapping
        
       | minimaxir wrote:
       | Surprisingly, the Extract Text from Image action is available on
       | Intel Macs: normally, features like automatic-image-OCR is
       | limited to Apple Silicon Macs.
        
       | predictsoft wrote:
       | On Windows, A9T9 does a great job of OCR'ing scanned JPEG files
       | (and any JPEG file). It's also free.
       | 
       | I scanned about 100 A4 documents in just a couple of minutes.
        
       | geniium wrote:
       | Have u guy tried ChatGpt or other alternative?
        
       | justinl33 wrote:
       | Awesome! Is there a similar technique for the Apple vision '
       | _Copy Subject_ ' feature? I've become extremely reliant on it,
       | but it feels very limited in access.
        
         | pimlottc wrote:
         | I had to Google this, do you mean the feature in Photos on
         | mobile where you can "extract" items from a picture and make
         | them into stickers? Apple seems to call it "lifting subjects"
         | [0] [1].
         | 
         | 0: https://support.apple.com/guide/iphone/lift-a-subject-
         | from-t...
         | 
         | 1: https://developer.apple.com/videos/play/wwdc2023/10176/
         | 
         | EDIT: Try replacing the "Extract text" action with "Remove
         | background". When running the shortcut, use "-o" to specify
         | output image filename.                  shortcuts run remove-
         | background -i ~/Downloads/portrait-beard.avif -o beard.jpg
        
       | cyberax wrote:
       | It doesn't work for Chinese characters :(
        
       | andreasley wrote:
       | macOS Ventura and newer actually have basic OCR functionality
       | integrated into the Image Capture UI. When using an AirPrint-
       | compatible scanner and scanning to PDF, the checkbox "OCR" is
       | shown in the right pane.
        
       ___________________________________________________________________
       (page generated 2024-01-02 23:00 UTC)