Reprinted from TidBITS by permission; reuse governed by Creative Commons
license BY-NC-ND 3.0. TidBITS has offered years of thoughtful commentary
on Apple and Internet topics. For free email subscriptions and access to the
entire TidBITS archive, visit http://www.tidbits.com/


        ChatGPT Atlas Digitized Book Tables That Stymied Other OCR Tools

   Adam Engst

   I got sucked down another geeky rabbit hole. While I doubt that wanting
   to perform optical character recognition on large tables in a book is a
   common desire, it was enough of a learning experience with modern AI
   tools that I can't resist sharing.

Enabling Calculations in a Workout app

   I coach weekly indoor track workouts for the Finger Lakes Runners Club
   from November through April. Somewhere between 40 and 70 people show up
   every Tuesday to follow my instructions while running in ovals. (The
   power!) To account for the fact that my runners range in speed from a
   4:19 mile to a 10:30 mile, I base my workouts on the system developed
   by running coach Jack Daniels. (I had the pleasure of learning from
   Jack in person before he passed away recently'he lived nearby and was a
   super nice guy.) He defined specific paces for different types of
   training: Easy, Marathon, Threshold, Interval, and Repetition.

   Briefly, the system works in two steps. First, you use a recent race
   time to look up a number called VDOT in one table'Jack's shorthand for
   aerobic fitness. Then you use that VDOT to look up your prescribed
   training paces in a second table. For instance, if you want to know how
   fast to run a 400m rep at Interval pace, you'd find your VDOT from your
   recent 5K time, then look up the corresponding 400m time. These tables
   are in his book, [1]Daniels' Running Formula, Fourth Edition, and
   they're big'the training pace table covers nearly four pages.

   With the help of the [2]Beyond Better AI development app, which I've
   used for building an iPhone app to help with backup race timing and
   will be writing more about soon, I wanted to write a Web app that would
   let my runners enter a race time to find their VDOT and then see how
   fast they should run for different distances at different specified
   paces. Various online calculators already do this, but I wanted mine to
   let me build a workout like a 200m-400m-600m-800m-600m-400m-200m ladder
   with the shorter distances (200m and 400m) at Repetition pace and the
   longer ones (600m and 800m) at Interval pace. It's a fun workout, but
   it requires runners to keep a lot of numbers in their heads. Tonya
   writes them on her hand. Once I have built a workout, I want to share a
   URL to the calculator in the workout announcement, and when any runner
   clicks it, it will show them their specific times for each rep.

   As has been my experience with AI development, the hard stuff was easy
   and the easy stuff was hard. After a lengthy discussion with Beyond
   Better to nail down exactly what I wanted, it took only a few prompts
   to build the app and get it working roughly the way I wanted. Before I
   put the effort into the final layout and making sure it worked on
   multiple platforms and all that, I decided to spot-check its numbers
   against the book. AI chatbots are magic, but you can't trust them,
   particularly with numbers.

   That's when I started to go down the rabbit hole. You see, the tables
   of numbers in Daniels' Running Formula are derived from equations that
   Jack Daniels and his friend Jimmy Gilbert (whom Jack coached and later
   became a programmer for NASA) developed based on years of testing many
   runners of different ability levels.

   Initially, the app's training pace numbers were utterly ridiculous, so
   I directed Beyond Better to research and implement the Daniels'Gilbert
   equations. It did so and got much closer, but the times were still 2'4
   seconds too fast per lap. I gave it a few examples of the correct
   numbers, which caused it to cheat by hard-coding the correct numbers
   for my examples and interpolating from them, which brought it closer,
   but still not right.

OCR Options

   After some additional back-and-forth, Beyond Better suggested that if I
   wanted the training paces to match the book exactly, I should provide
   that data. It wasn't readily available online in its entirety, and I
   didn't relish the idea of manually entering 700-plus numbers. 'Surely
   there must be a better way,' I said as I turned the first corner in the
   rabbit hole, 'there are lots of ways to extract text from an image
   now.' I took photos of the tables and went looking for a solution.
     * ChatGPT: I'll admit it. Once an AI chatbot has worked magic for you
       once, it's hard not to go back to the well. I fed my photos to
       ChatGPT, which is typically pretty good at extracting text.
       However, ChatGPT informed me that the table was too large and the
       numbers too small and too close together for accurate recognition.
       I took five photos and cropped them to focus on specific columns,
       which it liked better, but it still wanted to process ten rows at a
       time and have me verify all the numbers. The very first check
       revealed some errors, and I didn't want to have to verify 700-plus
       numbers manually either.
     * Live Text: Several years ago, Apple introduced Live Text, which
       lets you select text in images in various macOS apps, including
       Photos and Preview. When I first selected some text and pasted it
       into BBEdit, the result was a mess, with every number on its own
       line. The results were somewhat better when I tried Numbers and
       Excel, both of which somehow extracted the fact that the data was
       tabular from the Live Text clipboard contents. Unfortunately,
       because I wasn't willing to damage my signed copy of the book, I
       couldn't get a perfectly straight photo'there was always some page
       curl that I couldn't eliminate, which caused the data to become
       offset in tricky ways.
     * Microsoft Excel: Who knew that Excel had the option to insert data
       from a picture? I didn't until ChatGPT told me, and even then I had
       to ask for more help to find the From Picture button on the Insert
       ribbon'not the Insert menu. Unfortunately, although it offered a
       nice interface for correcting data it had trouble with, it wasn't
       nearly accurate enough.
     * ChatGPT Atlas: In an effort to automate comparing the results of my
       app against an online calculator, I tried using [3]ChatGPT Atlas in
       its agentic mode. On a whim, I fed it my five photos and told it to
       make me a CSV. Unlike the regular ChatGPT, it took on the challenge
       of cutting the photos into smaller pieces and running OCR on those
       manageable chunks. Astonishingly, 13 minutes later, it had gotten
       everything right, even inserting the data from one photo into the
       proper spot in the CSV, even though it didn't explicitly include a
       column of VDOT values (it was the inside of the page spread).
       Curiously, where the book reports all times under 100 seconds as
       just seconds, ChatGPT Atlas chose to rewrite some of them in
       minutes and seconds, so 85 seconds became 1:25. But only some!
       Strange, but not problematic.
     * A friend with Shottr: While I was working on this, I was scheduled
       to do a podcast with Allison Sheridan, who thoroughly appreciates
       the lure of a good rabbit hole. When I explained what I was working
       on, she asked me to send her the photos because she thought the
       [4]Shottr app, which she likes for screenshots, could digitize the
       data. It could, though she desaturated the images in Preview and
       increased the exposure to make the text stand out better. Then, in
       Shottr, she captured with breaks, which preserved the tabular data.
       She did have to capture relatively small chunks at a time and fix
       some mistakes, but it sounded like many fewer than I saw when using
       Live Text on the entire image. Allison's CSV matched the one
       ChatGPT Atlas generated, and both passed all my spot checks.

   There are probably other techniques I could have used. Dedicated OCR
   software might have done a better job or known how to format tabular
   data correctly. I also could have improved the results if I had been
   willing to break the book's binding or even cut the pages out to get
   better scans. But I didn't want to put a ton of effort into either
   image preparation or digitization because I still don't know whether
   I'll use this data'I may discover a reason to revert to equations.

Moving Beyond Autocomplete

   For a quick digitization task from quickly snapped photos, ChatGPT
   Atlas surprised me, especially with how it leveraged its agentic
   capabilities to go beyond what ChatGPT could do on its own.

   Many AI skeptics dismiss large language models as nothing more than
   'autocomplete on steroids.' And look, I understand the impulse. Given
   the AI investment bubble and the industry's inflated promises, reducing
   the technology to its basic mechanism feels like cutting through the
   hype.

   But here's the thing: I handed ChatGPT Atlas an ill-formed problem. I
   gave it five poorly exposed photos with page curl and minimal guidance
   on handling the spatial relationships between images. I didn't even
   tell it that one photo was from the inside of a page spread and lacked
   the VDOT column that would simplify matching the rows with the other
   photos. Nevertheless, ChatGPT Atlas figured all that out. It
   autonomously decided to break the images into manageable chunks, ran
   OCR on each, reconstructed the table structure, and placed everything
   in the correct order. It made decisions I would have had to make myself
   with traditional tools, and it made them correctly.

   If we're going to criticize AI when the results are problematic'and we
   should'we also have to be open to acknowledging when it does something
   impressively well. Even if ChatGPT Atlas is using the same underlying
   large language model technology, turning five photos into a clean table
   of digital data is well beyond what anyone would consider autocomplete.

References

   Visible links
   1. https://www.amazon.com/dp/1718203667/?tag=tidbitselectro00
   2. https://beyondbetter.app/
   3. https://chatgpt.com/atlas
   4. https://shottr.cc/

   Hidden links:
   5. https://tidbits.com/uploads/2025/11/Jack-Daniels-tables-scaled.jpg
   6. https://tidbits.com/uploads/2025/11/ChatGPT-OCR.jpg
   7. https://tidbits.com/uploads/2025/11/Live-Text-OCR-scaled.jpg
   8. https://tidbits.com/uploads/2025/11/Excel-OCR.jpg
   9. https://tidbits.com/uploads/2025/11/ChatGPT-Atlas-OCR.png
  10. https://tidbits.com/uploads/2025/11/Shottr-OCR.jpg

.