[HN Gopher] Ask HN: How to transcribe 1000s of handwritten notes
       ___________________________________________________________________
        
       Ask HN: How to transcribe 1000s of handwritten notes
        
       I have 10 years' worth of journals.  My handwriting is not great!
       None of the off the shelf solutions come even close to recognizing
       my handwriting.  Can you think of anything better than just opening
       every single file and manually transcribing it?  I have been
       thinking about training a model to first divide the images into
       lines of text. Then, it will be easier to transcribe, and
       automatically those transcriptions will be associated with areas of
       the image, in case I figure out a good handwriting model.
        
       Author : bckr
       Score  : 65 points
       Date   : 2024-05-31 01:57 UTC (1 days ago)
        
       | throwaway211 wrote:
       | Can you read them? Speech to text perhaps. That can also be done
       | locally.
       | 
       | If a note's a minute, 1000 notes are around 16 hours of reading.
       | Scale time needed depending on if it takes less or more than a
       | minute to read. Add a note reference to the start of each
       | recording, like a zettelkasten, so the scanned file, recording
       | and text cross-reference.
       | 
       | If assessing other solutions, that's at least an upper bound on
       | the cost of any other solution.
        
         | smarm52 wrote:
         | Some good transcription solutions:
         | 
         | https://zapier.com/blog/best-text-dictation-software/#window...
         | 
         | https://otter.ai/
         | 
         | (Haven't actually tried Otter, but it gets a LOT of good
         | reviews.)
        
         | mvkel wrote:
         | This is the best answer.
         | 
         | Any techie will desperately try to come up with a tech solution
         | to this problem.
         | 
         | A few months of development later, you might have something
         | that yields trustworthy output.
         | 
         | But 16 hours? No tech solution will be done faster than that.
         | 
         | Don't build a factory for a one-off.
        
           | bckr wrote:
           | > But 16 hours? No tech solution will be done faster than
           | that.
           | 
           | True
           | 
           | > Don't build a factory for a one-off.
           | 
           | One thing on my wishlist is that I end up with a way to
           | instantly transcribe my notes.
        
             | sky2224 wrote:
             | > One thing on my wishlist is that I end up with a way to
             | instantly transcribe my notes.
             | 
             | Many of the implementations are clunky in my opinion, but
             | this exists as a feature in many note taking tablet apps.
        
           | PartiallyTyped wrote:
           | I mean, I'd totally try Tesseract[1], a few samples, and a
           | python script. Shouldn't take more than 5 minutes to validate
           | this.
           | 
           | Adobe also has the whole scan thing, and apple can -- in some
           | cases -- correctly transcribe characters from images.
           | 
           | https://github.com/tesseract-ocr/tesseract
        
           | lozenge wrote:
           | Err, 16 hours just to read. Then you still need to deal with
           | the inaccuracies of speech to text.
        
         | dSebastien wrote:
         | You made my day. It's obviously an awesome approach!
         | 
         | Documented here:
         | https://notes.dsebastien.net/30+Areas/33+Permanent+notes/33....
        
         | BetterWhisper wrote:
         | Reading the notes aloud is a really good solution without
         | having to spend a ton of time on trying to OCR handwriting.
         | 
         | I can recommend https://www.videototextai.com/ for transcribing
         | huge amounts of audio. (Disclaimer, I am the founder of
         | VideoToTextAI)
        
         | bckr wrote:
         | Great recommendation, thank you. I have considered this and
         | it's definitely the simplest way to achieve what I want.
        
           | jacknobody wrote:
           | If you also gave all that text, with its audio, to the
           | putative AI, it might have enough training material to learn
           | to read your handwriting.
        
             | bckr wrote:
             | Agreed!
        
             | refulgentis wrote:
             | I've been working on an AI app for 18 months and almost
             | replied to tell you no, AI cant do that.
             | 
             | And, doh, took a minute, and realized I'm a dunce.* And now
             | there's at least 3 I can think of off the top of my head,
             | not to mention local.
             | 
             | Training a handwriting recognition AI is universally
             | accessible. What a time to be alive.
             | 
             | * If you're dense like me: they're not saying "any AI" as
             | "any handwriting recognition machine learning model you
             | build from that dataset". They're saying as AI as any
             | multimodal LLM, it'll do in context learning on what you
             | upload.
        
         | canadaduane wrote:
         | Using MacWhisper (or other similar whisper.cpp app or utility),
         | you could do it all on-device for a free or one-time fee, too.
         | 
         | note: I have no relation to MacWhisper, just a happy customer.
        
       | sjhaba wrote:
       | Have you tried chatgpt? 10k image requests should be pretty cheap
        
       | mariocesar wrote:
       | It seems like using speech-to-text is a faster alternative. You
       | can also consider outsourcing the work. I know abbyy.com offers a
       | service for this. Even though you may not be their target market,
       | they have services for implementing hybrid machine learning and
       | data entry solutions.
       | 
       | If you're into dreaming up cool solutions, you could try using
       | smart pens or tablets to write stuff and then teach a model to
       | recognize your handwriting. But for now, it's just a dream.
        
       | GianFabien wrote:
       | I have about 5000 pages of research notes. I have found that the
       | quality and usefulness of the material varies greatly. Much of
       | the older material is of little relevance with the passing of
       | time. As futile it may seem, I'm finding that re-reading and
       | summarizing rather than straight transcribing is effective. I'm
       | refreshing my memory of what I did discover and only typing up
       | what is relevant now. Fortunately I'm a fast touch typist, so I
       | can stare at the handwritten page and type; only glancing at the
       | screen after a paragraph or two. Two things I find useful to
       | retain are the dates of the original materials and bibliographic
       | references.
        
         | bckr wrote:
         | I think this is the best approach for project-related
         | information. It's essentially the Second Brain approach.
         | 
         | I'm interested in these journals for autobiographical /
         | psychiatric reasons. Therefore, indeed, the more recent
         | information is more valuable, but not with such a steep drop-
         | off.
         | 
         | The oldest 10% might contain 5% of the value.
        
       | dankwizard wrote:
       | in the time its taken you to look into this and procrastinate,
       | you could have done it by hand
       | 
       | greener pastures
        
         | bckr wrote:
         | Indeed--except for the cost of time, attention, energy, wrist
         | and neck pain.
         | 
         | This is something I want done eventually, and I'm smelling the
         | flowers on the way there.
         | 
         | This HN comment thread is one of the flowers on the path to my
         | OCR pure land zen state
        
       | heavyset_go wrote:
       | You might be able to manually transcribe some of the notes and
       | then fine tune an existing handwriting recognition model using
       | them.
        
         | bckr wrote:
         | Yep, I think that's what I'll need to do in the end because my
         | handwriting is just so... expressive.
         | 
         | In order to get that data I'll probably need to chop up the
         | words in my diaries for privacy, then outsource to a human-in-
         | the-loop labeling service.
        
       | Sandr44 wrote:
       | I have scanning my handwritten notes also on my todo-list, some
       | of them are even taken digitally. I have noticed that the offline
       | ocr on Samsung (or maybe on Android devices generally) is pretty
       | good, even with characters that don't exist in English.
       | Unfortunately there don't seem to be implementations for batch
       | scanning with Android handwriting ml kit or Samsung vision ocr
        
       | pininja wrote:
       | Which off the shelf solutions have you tried?
        
         | bckr wrote:
         | Thanks for asking, I've tried at least:
         | 
         | - Google Cloud OCR
         | 
         | - Transkribus
         | 
         | - ChatGPT(4V)
         | 
         | - EasyOCR
         | 
         | - Tesseract
         | 
         | - MacOS text highlighting
        
       | t312227 wrote:
       | hello,
       | 
       | imho. (!)
       | 
       | * if you have a lot of "uniform" pages - read something like A4
       | -, get yourself a scanner with an automatic sheet-feeder
       | 
       | or throw some rainy-weekend afternoons on it & scan your notes
       | with some decent SOHO scanner
       | 
       | * don't get too excessive with resolution, 400+ pixels/inch are
       | enough for OCR ...
       | 
       | i always scan with 1200 and reduce the images to 600 px via
       | simple batch-processing / for example imagemagick "convert".
       | 
       | * get yourself a decent OCR software, which is able to read your
       | notes ...
       | 
       | i'm a big fan of abbyys "finereader", but sadly its prohibitively
       | expensive ... ;)
       | 
       | idk how well FOSS OCR software a la tesseract works for hand-
       | written notes.
       | 
       | * create pdfs with automatically detected text in the background
       | for search and the scanned image of the notes.
       | 
       | it additionally generates XML-metadata & from there: whatever you
       | want (web frontend ... :)
       | 
       | just my 0.02EUR
        
         | bckr wrote:
         | Thanks buddy. I've already got them all digitized thankfully.
         | That was a whole thing in and of itself.
         | 
         | Unfortunately the half-dozen or so OCRs I've tried fail
         | miserably on even my clearest pages.
         | 
         | Lots of good ideas in this thread, though.
         | 
         | Thank you!
        
       | sid- wrote:
       | live text is an iOS feature you could experiment with
        
       | SrFil wrote:
       | Which off the shelf solutions have you tried?
       | https://www.transkribus.org/ is generally pretty good with hard
       | to read texts.
        
       | praving5 wrote:
       | If those notes are really worthy and meaningful to you, then hire
       | someone to type them out for you. If there is something that
       | money can buy, then save your time!
        
         | bckr wrote:
         | You're right, and thanks to another one of the commenters, I
         | have an idea for how I could do this.
         | 
         | Take my journals, and run a relatively simple word separation
         | algorithm over them.
         | 
         | Shuffle up those words and pay to have them annotated.
         | 
         | Reconstruct the dataset from there.
        
       | piloto_ciego wrote:
       | I just tried ChatGPT on my handwritten notes, OCR can very seldom
       | recognize my handwriting and it nailed it. It's cheap, you should
       | give that a shot.
        
         | d13 wrote:
         | Seconded. GPT4 can do this perfectly.
        
           | d1sxeyes wrote:
           | Hm. It depends how much you care about accuracy. ChatGPT does
           | a great job overall, but I have found frequent errors and
           | hallucinations around numbers, names, and dates in
           | particular.
           | 
           | If you do go for GPT-4, just be careful of this. Where other
           | transcription services might fail, or give some implausible
           | output which highlights that you need to check the source,
           | ChatGPT might give a highly plausible but incorrect
           | transcription from which you might not immediately identify
           | that transcription has failed.
        
             | piloto_ciego wrote:
             | ChatGPT-4o only hallucinated on my son's god awful
             | handwriting with mathematics, mine is pretty bad and it
             | still did fine.
             | 
             | What's funny is my son made an error in the arithmetic and
             | ChatGPT corrected it - that was the hallucination.
        
             | Nevolihs wrote:
             | Ideally OP would keep the source images of the original
             | journal pages around even after transcription. I think
             | ChatGPT (or LLM in general) is probably the best option,
             | but the best overall solution would accept that LLMs are
             | flawed and would require long-term iteration.
        
               | bckr wrote:
               | Thanks all, I tried ChatGPT and it didn't like my
               | handwriting at all.
               | 
               | Which is understandable... :')
        
               | Nevolihs wrote:
               | Have you considered training a model on your handwriting?
        
               | bckr wrote:
               | Yep! However that needs a ton of labeled data, so a
               | bootstrapping method is required.
               | 
               | I like the idea of doing it by speech recognition, or of
               | chopping it up for privacy and then outsourcing that to
               | humans at cost.
               | 
               | One thing I ... Imagine ... would help--is having a
               | private web app where I could pull up a document and then
               | make a voice recording on my phone.
               | 
               | Maybe I'll put this together on my plane trip.
        
               | d1sxeyes wrote:
               | The problem with ChatGPT is that you might not know to
               | check the original.
               | 
               | If the original text is "I'm getting married on the 10th
               | July", you'll know to check the handwritten note if it
               | says "I'm getting married on the l@ July" but not
               | necessarily if it says "on the 16th July". ChatGPT seems
               | to do the second quite often.
        
       | dougdimmadome wrote:
       | I was in a similar situation last month. Not quite 1000s of pages
       | but close to 100. Just enough to make typing them out seem like
       | too much work.
       | 
       | I found an app online (I wont even name it) which promised
       | incredibly accurate handwriting transcription. Signed up and
       | found it was true, but they were just sending images directly to
       | chatGPT and returning the result and then charging a fee on top.
       | 
       | I started working on an open source version. It took me only a
       | few hours and I'm sure anyone else could pull it together. used
       | chatGPT example code to connect to API and send an image with a
       | prompt along the lines of "please transcribe the text in this
       | image and return only that, nothing else". even with that
       | instruction it still sometimes prefaces with "sure! I can do
       | that.", which I think is the AI equivalent of Homer Simpson
       | writing "ok" in the "please leave this section blank" part of the
       | form. Anyhoo, I had a basic job queue written, pull in images in
       | order of file creation date and fire them off, append the text to
       | a text file after. There was some cleanup of the file required
       | (weird line breaks) but it saved me days of typing.
       | 
       | You still need a chatGPT API key for it but it does take a good
       | bit of the work out.
       | 
       | At the moment I'm investigating using a free local model. LLava
       | is just as accurate but takes longer than sending it to ChatGPT.
       | but if you were worried about burning credits it would be the way
       | to go.
        
       | canucker2016 wrote:
       | How about creating a crowdsourced captcha service?
       | 
       | Take scans of your journal pages, split the jpegs/pics into word
       | fragments, display a couple of fragments to captcha clients,
       | generate completed journal entries when the consensus gets
       | reasonably high for each word fragment.
       | 
       | Not sure how captcha services start from scratch - probably ask
       | around/check with google search.
       | 
       | Privacy goes out the door, but you should be able to show
       | disjointed word fragments so no one could reconstruct enough of a
       | single journal entry to expose your more personal info unless
       | they were very determined. Or maybe split the scans into
       | individual letter fragments instead?
       | 
       | Then monetize this for other people in the same situation...
        
         | bckr wrote:
         | I love this idea. It's way overengineered for this problem, and
         | I already have a startup that requires my complete attention,
         | but thank you for writing this out.
         | 
         | And if anyone decides to do this, let me know!
         | 
         | Privacy is one of the reasons I would pay for a service like
         | this, rather than pay a person to (try) to do it.
         | 
         | These journals contain a lot of psychiatric-level information
         | about me, which is both what makes it valuable and sensitive.
        
         | Suppafly wrote:
         | >Then monetize this for other people in the same situation...
         | 
         | That's basically what Amazon Mechanical Turk is, without the
         | captcha bit.
        
       | user_agent wrote:
       | A hint that might help at least partially: novadays for managing
       | digital and handwritten notes I juse Joplin, but before that I
       | was an avid Evernote user. Having a paid plan active gives you
       | access to Evernote's OCR function on their backend. I had a lot
       | of handwritten notes uploaded as attachments to Evernote, and I
       | remember that despite my handwritnig being awful their softwre
       | was able to parse it and allow me to, among others, perform quite
       | advanced searches on my handwritten notes. I'm not sure if
       | there's a way to make Evernote's OCR backend work for you in
       | scenarios more elastic that what it's been built for, but I
       | wanted to menion that there's this unique OCR tech that I think
       | does far better job that any standalone OCR software I tried (for
       | my handwriting style which I consider awful). It might be worth
       | researching further for you.
        
         | disqard wrote:
         | I used to use Evernote for a while, and like you, was a fan of
         | its handwriting OCR.
         | 
         | Sadly, it is no longer software I would recommend:
         | 
         | https://news.ycombinator.com/item?id=36609641
        
       | brudgers wrote:
       | _Can you think of anything better than just opening every single
       | file and manually transcribing it?_
       | 
       | No, because the work of manual transcription is a way of telling
       | if transcribing them is worth doing. Or maybe pay someone to
       | transcribe it. Spending money is also a good way to tell if
       | something matters (assuming you have sufficient money).
       | 
       | Orthogonally, maybe building a system is what you really want to
       | do (for many people that would be more enjoyable than revisiting
       | old journal content).
       | 
       | Finally, starting from hand transcription is an entry point into
       | rewriting what you wrote. Rewriting is writing and if there's
       | publication on your roadmap, you will be rewriting anyway.
       | 
       | There's no easy way to write well. Good luck.
        
         | bckr wrote:
         | > No, because the work of manual transcription is a way of
         | telling if transcribing them is worth doing.
         | 
         | Yes, for a percent of the work. I have spent a bunch of time
         | already digitizing my journals (including a loooong detour
         | where I had to organize them because I didn't exactly have them
         | in chronological order...)
         | 
         | I have seen and manually transcribed enough of my journals to
         | know I want the rest.
         | 
         | But it's not worth it to do it manually at this time.
         | 
         | > Orthogonally, maybe building a system is what you really want
         | to do (for many people that would be more enjoyable than
         | revisiting old journal content).
         | 
         | And I do want to build a system to do this, as part of my own
         | personal mind bike.
         | 
         | Thanks for your comment.
        
       | workergnome wrote:
       | I know you've said you've looked at off-the shelf tools, but in
       | that did you consider https://www.transkribus.org/? It's a tool
       | designed for reading historical, hand-written documentation--gets
       | used a lot in archives and historical studies. Might be worth an
       | evaluation to see if your handwriting is not great in similar
       | ways to Dutch bankers from the 18th century.
        
         | bckr wrote:
         | I did have a look at transkribus and their models did not work
         | for my needs.
        
       | tmaly wrote:
       | I record myself reading my hand written notes, then I just upload
       | the mp3 of the recording to MS 365 to transcribe.
       | 
       | I put special stop words like highlight/return so then I can post
       | process and ensure the markdown formatting looks good.
        
         | ProllyInfamous wrote:
         | Whisper.app will do this locally on Apple Silicon, FYI.
        
       | fred_is_fred wrote:
       | Would this work with Mechanical Turk? I wonder how much it would
       | be.
        
       | TheMiddleMan wrote:
       | I've found decent success with Googles Cloud Vision API for
       | transcribing cursive writing on the backs of 1000s of family
       | photos.
       | 
       | https://cloud.google.com/vision/docs/handwriting
       | 
       | I threw together a basic UI with the transcribed text in an
       | editable area next to the image where I would edit any
       | adjustments as it wasn't 100% perfect.
        
         | bckr wrote:
         | Thanks! I did try the vision demo in the console. One problem
         | might be that my handwriting is idiosyncratic / there may be
         | more training data available for historical handwriting styles?
        
         | daemonologist wrote:
         | Yeah OCR remains an area where the open source solutions can't
         | quite compete on quality with what the cloud providers offer.
         | I've found that (unless you have a cost-prohibitive number of
         | documents to process) if there are complex layouts,
         | handwriting, etc. it's worth going to Google or AWS.
        
       | 999900000999 wrote:
       | Take a picture, upload it to chat GPT and see what happens.
       | 
       | If it works then scan all the pages and run though it with a
       | script.
       | 
       | Shouldn't take you more than about an hour to code ( with Chat
       | GPT!) in Python.
        
         | bckr wrote:
         | Indeed ChatGPT is one of the solutions I tried.
        
           | ProllyInfamous wrote:
           | I presume _not well_ , then [since you're again asking]?
        
             | bckr wrote:
             | Of course. If ChatGPT worked for this problem, I'd be in
             | human-machine interface heaven.
             | 
             | Oh well.
        
           | 999900000999 wrote:
           | If you don't care about personal privacy, I would probably
           | just go on Fiverr and upload it to somebody to do it at like
           | $5 a page. Even reading all the journal notes is going to be
           | very tiring.
        
         | ProllyInfamous wrote:
         | Your journaling must not be _as insane as mine_..?
        
       | chaos_emergent wrote:
       | have you tried using gpt-4o? It's pretty incredible at
       | recognizing handwriting.
        
       | wriggler wrote:
       | Have you tried https://www.handwritingOCR.com?
       | 
       | It is designed to do exactly what you are looking for, and has
       | been used very successfully by many others for that same purpose
       | (I'm the founder).
       | 
       | It is not as cheap per page as Google Document AI, for example,
       | but it does tend to be much more accurate for handwriting, so
       | usually ends up cheaper when editing time is factored in.
       | 
       | If you find it does work well with your handwriting, please get
       | in touch and I can try to fit the pricing to your use case.
        
         | kwanbix wrote:
         | Sounds super cool, but why "per month" and not some "per page"
         | pricing?
        
           | wriggler wrote:
           | Thanks!
           | 
           | I'm still experimenting with pricing, and agree that per page
           | pricing makes logical sense. Still, it's harder for me to
           | build a sustainable business on that model.
           | 
           | I will probably test a few per-page or single payment options
           | soon, though.
        
             | fragmede wrote:
             | people don't want yet another monthly subscription so
             | that's going to be a harder sell. even though the business
             | advice is that monthly subscriptions are better for you,
             | the business owner, you can't forget about your customer.
             | who wants to setup a subscription for something they think
             | they're only going to use once or twice? and then have to
             | go through the bullshit of cancelling.
        
               | wriggler wrote:
               | True, in cases like OP a one-off purchase will be better.
               | But I also have business customers with regular and
               | ongoing document processing needs for whom a subscription
               | does work.
               | 
               | I expect the answer may be a combination of the two.
        
       | hamishleahy wrote:
       | Try using the RATH analyser from github
        
       | freddealmeida wrote:
       | I built this firm a decade ago. https://www.cogent.co.jp/en/
       | 
       | Works with English and Japanese. Sadly I'm no longer with the
       | team there but the work is solid. Try it out.
        
       | ZunarJ5 wrote:
       | Archaeologist tool, you'll want to fine tune it for yourself.
       | 
       | https://readcoop.eu/transkribus/
        
       | tcsenpai wrote:
       | Theoretical solution: train a model on your handwriting. There
       | should be plenty of easy (relatively) to use apps and frameworks
       | for that.
       | 
       | It will take time but you will have a pretty tailored solution.
       | 
       | Also of course: first of all try to process the images so that
       | they only are white and black (not greyscale, actual B/W
       | pictures)
        
       | bcrl wrote:
       | Amazon's Textract seems to do a decent job on my horrific
       | scribbles, and is far better than any of the open source OCR
       | tools I tried. To get started quickly, try using Textractor:
       | https://github.com/Artikash/Textractor
        
       | f_k wrote:
       | Shameless plug: https://getsearchablepdf.com
       | 
       | There's a free trial so you can check if it works for your
       | handwriting.
        
       | imvetri wrote:
       | I have my 3 years of paper, I wanted to use it to experiment
       | building a black mass program. A blackmass program is a concept
       | which will yield to a black mass in the computer, capable of
       | building conceptual cool tech like automating your daily work,
       | self experimentation, self learning etc.
       | 
       | My notes will have instructions to reach the black mass state, a
       | computer image scanner will try to learn my handwritings, take
       | them as instructions, connect dots etc.
       | 
       | The design of this system is cryptic and challenging. because,
       | side effect to create a computational program will result in a
       | circling thoughts for me. And its hard for me to convert it into
       | an action.
       | 
       | Taking that as an inspiration, this program is a circling
       | program, which means, it will constantly spiral upwards in a
       | value that is definitive to its actions in the past.
       | 
       | All my notes has information or points or ideas about this
       | fictional concept. I burned the notes which were repetitive, kept
       | the rest.
       | 
       | When I did that, It created more head space for me. The
       | headspace, helped to solve problems and have more space for more
       | learnings.
        
       | 65 wrote:
       | AWS Textract has worked better for me than the other cloud OCR
       | solutions.
        
       | hm-nah wrote:
       | I've had good success with:
       | 
       | 1. Scan them (or take photos of each page) 2. put them files in a
       | directory 3. Make a Python script that sends them to OpenAI
       | GOT-4o 4. Store the text as a new file in the directory.
        
       ___________________________________________________________________
       (page generated 2024-06-01 23:01 UTC)