[HN Gopher] How OpenElections uses LLMs
___________________________________________________________________
How OpenElections uses LLMs
Author : m-hodges
Score : 73 points
Date : 2025-06-19 16:11 UTC (6 hours ago)
(HTM) web link (thescoop.org)
(TXT) w3m dump (thescoop.org)
| simonw wrote:
| This is such an excellent example of a responsible and thorough
| application of vision LLMs to a gnarly data entry problem.
| polskibus wrote:
| It's also an excellent example on how lack of forced machine-
| readable format for gov publishing is a PITA.
| sitkack wrote:
| json to qr code would be a good start. PRIOR ART inb4 a
| troll.
| Mtinie wrote:
| If I was in power and wanted to continue said rule, I'd
| definitely discourage the adoption of any standardized
| formatting for election results.
|
| Not, you know, for any nefarious purpose...but because what
| we've used forever was good enough for grandpappy, so it's
| obviously good enough for us.
|
| _/ cough_
| nxrabl wrote:
| Very interesting! Is this the state of the art for accurate OCR
| of tabular PDFs, or is there other work in the space to compare
| against?
| SnooSux wrote:
| There's lots of posts on HN for developments and companies
| doing OCR and Document Extraction. It's a classic CV problem
| but still has come a long way in the past couple years
| dwillis wrote:
| Yeah, this is a very well-traveled road, but LLMs have made
| some big improvements. If you asked me (the guy who wrote the
| original piece linked above) what I'd use if accuracy alone
| was the goal, probably would be AWS Textract. But accuracy
| and structure? Gemini.
| benob wrote:
| I wonder how difficult it would be to bias a model so that it
| subtly corrupts election results when performing OCR.
| croemer wrote:
| Surely not hard but why?
| bilbo0s wrote:
| Easier to steal elections?
|
| Don't have to bother with gerrymandering, or slick legal ways
| to arrest people for voting with the wrong documents. Or just
| good old fashioned intimidation, like making the polling
| place the police station or the ICE detention facility.
|
| It's just a lot smoother process when you can simply write
| some software to manipulate the count.
|
| Who's gonna check?
|
| (No, seriously, Who's gonna check? Because you also need to
| layoff everyone in that department once you're in power.)
| simonw wrote:
| Corrupted OCR won't help you steal elections. The result
| counting is a different process, with well designed checks
| and safeguards.
|
| The problem is that once the counts are done and have been
| reported a lot of places then print those results out on
| paper and then scan those papers into a PDF for anyone who
| asks for a copy!
| dwillis wrote:
| Many jurisdictions do risk-limiting audits using the
| original ballots, so futzing with the results wouldn't
| necessarily make that easier. Also, cast vote records are
| public in many states - those are records of each ballot
| cast. So people can check.
| philips wrote:
| I think you mean risk limiting, right?
| bilbo0s wrote:
| Freudian Slip?
| dwillis wrote:
| Yes, thanks! Fixed.
| philips wrote:
| You may consider reading about risk limiting audits.
| https://www.voting.works/audits
| GardenLetter27 wrote:
| Why is the original source data not available anywhere digitally?
|
| Since it's printed it is clearly already in a database
| _somewhere_. Why can 't that just be made public too.
|
| Seems bizarre to OCR printed documents (although I am aware of
| many companies doing this to parse invoices, etc.)
| simonw wrote:
| Welcome to government data.
|
| One key problem is that the US has tens of thousands of local
| governments, and each of them get to solve problems in their
| own way.
|
| Digital literacy of the kind that understands why releasing a
| CSV file is more valuable than a PDF is rare enough that most
| of them won't have someone with that level of thinking in a
| decision making role.
| codingdave wrote:
| > most of them won't have someone with that level of thinking
|
| That is an unfair take on it. Come out to the midwest and
| talk to some of the clerks in the small townships and
| counties out here. They do know the value of improved data
| and tech. And they know that investing in better tech can
| result in a little less money in the bank, which results in
| less gas to plow the roads, less money to pay someone to mow
| the ditches, which means on more car wrecked by hitting a
| deer. So the question is often not about CSV vs. PDF. It is
| about overall budget to do all the things that matter to the
| people of their town. Tech sometime just doesn't make the
| cut.
|
| Besides, elections tend to have their own tech provided by
| the county or state, so there is standardization and
| additional help on such critical processes.
|
| People running the smallest of government entities in this
| country tend to have pretty good heads on their shoulders.
| They get voted out pdq when they don't.
| simonw wrote:
| I'm not convinced by that argument. The data is clearly
| already in a spreadsheet of some sort already. I don't
| think "click export as CSV" v.s. "print out as paper and
| scan as PDF" is a cost decision.
|
| This isn't meant as shade! I have full respect for people
| working in those roles. Knowing the difference between a
| CSV file and a PDF file - and understanding why there are
| people out there who _curse the existence_ of PDFs and
| celebrate CSVs - is pretty arcane knowledge.
|
| Also note that I blamed people in "a decision making role"
| - changing procedures requires buy-in from management.
| People in management roles are less likely to be thinking
| about CSVs v.s. PDFs than the people actually executing on
| the work.
|
| As Derek pointed out in
| https://news.ycombinator.com/item?id=44320001#44322987 this
| may often be a vendor limitation - in which case there _is_
| a cost factor to consider, and the blame can also be shared
| between the vendor and the person who made the purchasing
| decision without understanding the difference between PDF
| and CSV export.
| fasthands9 wrote:
| In college (about 15 years ago) I worked for a professor who was
| compiling precint level results for old elections. My job was
| just to request the info and then do manual data entry. It was
| abysmally slow.
|
| This application seems very good - but still a bit amazing that
| lawmakers haven't just required that all data be uploaded via
| csv! Even if every csv was slightly different format, it would be
| way easier for everyone (LLM or not).
| xp84 wrote:
| I could be wildly off-base, but I wonder if some of these
| systems are airgapped, and the only way the data comes off of
| the closed system is via printing, to avoid someone inserting a
| flash drive full of malware in the guise of "copying the CSV
| file." Obviously there are or should be technical ways to
| safely extract data in a digital format, but I can see a little
| value in the provable safety that airgapping gives you.
| dwillis wrote:
| In some cases that's true, but for many jurisdictions the
| results systems are third-party vendor platforms, too.
___________________________________________________________________
(page generated 2025-06-19 23:00 UTC)