[HN Gopher] How OpenElections uses LLMs
       ___________________________________________________________________
        
       How OpenElections uses LLMs
        
       Author : m-hodges
       Score  : 73 points
       Date   : 2025-06-19 16:11 UTC (6 hours ago)
        
 (HTM) web link (thescoop.org)
 (TXT) w3m dump (thescoop.org)
        
       | simonw wrote:
       | This is such an excellent example of a responsible and thorough
       | application of vision LLMs to a gnarly data entry problem.
        
         | polskibus wrote:
         | It's also an excellent example on how lack of forced machine-
         | readable format for gov publishing is a PITA.
        
           | sitkack wrote:
           | json to qr code would be a good start. PRIOR ART inb4 a
           | troll.
        
           | Mtinie wrote:
           | If I was in power and wanted to continue said rule, I'd
           | definitely discourage the adoption of any standardized
           | formatting for election results.
           | 
           | Not, you know, for any nefarious purpose...but because what
           | we've used forever was good enough for grandpappy, so it's
           | obviously good enough for us.
           | 
           |  _/ cough_
        
       | nxrabl wrote:
       | Very interesting! Is this the state of the art for accurate OCR
       | of tabular PDFs, or is there other work in the space to compare
       | against?
        
         | SnooSux wrote:
         | There's lots of posts on HN for developments and companies
         | doing OCR and Document Extraction. It's a classic CV problem
         | but still has come a long way in the past couple years
        
           | dwillis wrote:
           | Yeah, this is a very well-traveled road, but LLMs have made
           | some big improvements. If you asked me (the guy who wrote the
           | original piece linked above) what I'd use if accuracy alone
           | was the goal, probably would be AWS Textract. But accuracy
           | and structure? Gemini.
        
       | benob wrote:
       | I wonder how difficult it would be to bias a model so that it
       | subtly corrupts election results when performing OCR.
        
         | croemer wrote:
         | Surely not hard but why?
        
           | bilbo0s wrote:
           | Easier to steal elections?
           | 
           | Don't have to bother with gerrymandering, or slick legal ways
           | to arrest people for voting with the wrong documents. Or just
           | good old fashioned intimidation, like making the polling
           | place the police station or the ICE detention facility.
           | 
           | It's just a lot smoother process when you can simply write
           | some software to manipulate the count.
           | 
           | Who's gonna check?
           | 
           | (No, seriously, Who's gonna check? Because you also need to
           | layoff everyone in that department once you're in power.)
        
             | simonw wrote:
             | Corrupted OCR won't help you steal elections. The result
             | counting is a different process, with well designed checks
             | and safeguards.
             | 
             | The problem is that once the counts are done and have been
             | reported a lot of places then print those results out on
             | paper and then scan those papers into a PDF for anyone who
             | asks for a copy!
        
             | dwillis wrote:
             | Many jurisdictions do risk-limiting audits using the
             | original ballots, so futzing with the results wouldn't
             | necessarily make that easier. Also, cast vote records are
             | public in many states - those are records of each ballot
             | cast. So people can check.
        
               | philips wrote:
               | I think you mean risk limiting, right?
        
               | bilbo0s wrote:
               | Freudian Slip?
        
               | dwillis wrote:
               | Yes, thanks! Fixed.
        
             | philips wrote:
             | You may consider reading about risk limiting audits.
             | https://www.voting.works/audits
        
       | GardenLetter27 wrote:
       | Why is the original source data not available anywhere digitally?
       | 
       | Since it's printed it is clearly already in a database
       | _somewhere_. Why can 't that just be made public too.
       | 
       | Seems bizarre to OCR printed documents (although I am aware of
       | many companies doing this to parse invoices, etc.)
        
         | simonw wrote:
         | Welcome to government data.
         | 
         | One key problem is that the US has tens of thousands of local
         | governments, and each of them get to solve problems in their
         | own way.
         | 
         | Digital literacy of the kind that understands why releasing a
         | CSV file is more valuable than a PDF is rare enough that most
         | of them won't have someone with that level of thinking in a
         | decision making role.
        
           | codingdave wrote:
           | > most of them won't have someone with that level of thinking
           | 
           | That is an unfair take on it. Come out to the midwest and
           | talk to some of the clerks in the small townships and
           | counties out here. They do know the value of improved data
           | and tech. And they know that investing in better tech can
           | result in a little less money in the bank, which results in
           | less gas to plow the roads, less money to pay someone to mow
           | the ditches, which means on more car wrecked by hitting a
           | deer. So the question is often not about CSV vs. PDF. It is
           | about overall budget to do all the things that matter to the
           | people of their town. Tech sometime just doesn't make the
           | cut.
           | 
           | Besides, elections tend to have their own tech provided by
           | the county or state, so there is standardization and
           | additional help on such critical processes.
           | 
           | People running the smallest of government entities in this
           | country tend to have pretty good heads on their shoulders.
           | They get voted out pdq when they don't.
        
             | simonw wrote:
             | I'm not convinced by that argument. The data is clearly
             | already in a spreadsheet of some sort already. I don't
             | think "click export as CSV" v.s. "print out as paper and
             | scan as PDF" is a cost decision.
             | 
             | This isn't meant as shade! I have full respect for people
             | working in those roles. Knowing the difference between a
             | CSV file and a PDF file - and understanding why there are
             | people out there who _curse the existence_ of PDFs and
             | celebrate CSVs - is pretty arcane knowledge.
             | 
             | Also note that I blamed people in "a decision making role"
             | - changing procedures requires buy-in from management.
             | People in management roles are less likely to be thinking
             | about CSVs v.s. PDFs than the people actually executing on
             | the work.
             | 
             | As Derek pointed out in
             | https://news.ycombinator.com/item?id=44320001#44322987 this
             | may often be a vendor limitation - in which case there _is_
             | a cost factor to consider, and the blame can also be shared
             | between the vendor and the person who made the purchasing
             | decision without understanding the difference between PDF
             | and CSV export.
        
       | fasthands9 wrote:
       | In college (about 15 years ago) I worked for a professor who was
       | compiling precint level results for old elections. My job was
       | just to request the info and then do manual data entry. It was
       | abysmally slow.
       | 
       | This application seems very good - but still a bit amazing that
       | lawmakers haven't just required that all data be uploaded via
       | csv! Even if every csv was slightly different format, it would be
       | way easier for everyone (LLM or not).
        
         | xp84 wrote:
         | I could be wildly off-base, but I wonder if some of these
         | systems are airgapped, and the only way the data comes off of
         | the closed system is via printing, to avoid someone inserting a
         | flash drive full of malware in the guise of "copying the CSV
         | file." Obviously there are or should be technical ways to
         | safely extract data in a digital format, but I can see a little
         | value in the provable safety that airgapping gives you.
        
           | dwillis wrote:
           | In some cases that's true, but for many jurisdictions the
           | results systems are third-party vendor platforms, too.
        
       ___________________________________________________________________
       (page generated 2025-06-19 23:00 UTC)