[HN Gopher] The drama in trying to convert election PDFs to Spre...
___________________________________________________________________
The drama in trying to convert election PDFs to Spreadsheets
Author : markessien
Score : 610 points
Date : 2023-03-23 09:40 UTC (13 hours ago)
(HTM) web link (markessien.com)
(TXT) w3m dump (markessien.com)
| djoldman wrote:
| Checking one at random:
|
| https://docs.google.com/spreadsheets/d/1HhV9iJxXTU9liAZPIDoM...
|
| ...shows 0s in the first row for all candidate parties. But the
| corresponding photo shows votes for all three:
|
| https://inec-cvr-cache.s3.eu-west-1.amazonaws.com/cached/res...
|
| I hope it's not a mistake and that there's some arcane
| law/technicality to explain it.
|
| edit: another mistake on row 21, LP should get 25 but it was
| credited to NNPP:
|
| https://docs.inecelectionresults.net/elections_prod/1292/sta...
| dan-robertson wrote:
| Yeah looks weird. When I scrolled to a random part, the numbers
| seemed to line up. They didn't say things were entirely correct
| though. Perhaps the data quality is sufficient for a challenge.
| Odd that the first rows seem more wrong though.
| neves wrote:
| Is it true that USA does not have a open data law to make
| everybody publish in CSV?
| JumpCrisscross wrote:
| > _Is it true that USA does not have an open data law to make
| everybody publish in CSV?_
|
| American elections are de-centralised. Each state comes up with
| its methods. In some, each county. (I'm not sure how publishing
| a CSV of vote totals would help.)
| jxramos wrote:
| > Then ominously, on the 20th of October of 2020 some people
| drove there in unmarked cars and removed all the Cameras
| installed at the tollgate.
|
| They at least capture some photos of the equipment. I wonder if
| anyone communicated with the individuals.
| OoTheNigerian wrote:
| Nice read. It's important to note
|
| 1.The 2020 protesters did not begin vandalizing property, but
| government infiltrated the protests by burning cars and maiming
| people.
|
| 2. The Obidient movement encompassed multiple sub movements of
| which a part of the #EndSARS was one of them. A vast majority of
| Peter Obi's supporters were not #EndSARS activists.
|
| 3. Elections in Nigeria are fraught with treacherous behavior so
| everyone suspects everything. It's important to be very careful
| with your communication. There is a lot of desperation in the
| land and so if in a position of information leverage, the
| responsible thing is to handle the privilege with care and
| transparency.
| pxc wrote:
| I'm impressed by the courage of the protesters here, and the
| tenacity of the youth voters.
|
| I hope they get a clear answer and a fair count, and whether they
| win this time or not, a real shot at cracking up their corrupt,
| two-party system.
| dec0dedab0de wrote:
| This would have been a good use for hn style shadow banning.
| Especially if they didn't publish the current tally, then the
| original easy to detect bots may have never realized you were on
| to them
| kevviiinn wrote:
| Wow what a cliffhanger, it sounds like they have to deal with the
| courts now. I hope we get an update
|
| https://www.msn.com/en-us/news/world/opposition-files-petiti...
| davedx wrote:
| Incredible story.
|
| Some more background:
| https://ng.usembassy.gov/nigerias-2023-elections/
| roschdal wrote:
| The people who cast the votes don't decide an election, the
| people who count the votes do. - Stalin.
| pxc wrote:
| In case the downright cartoonish character of this quotation
| made anyone else wonder if it were fake...
|
| that quotation is, indeed, fake:
| https://web.archive.org/web/20220128105324/https://www.polit...
| mtrovo wrote:
| Is the access to the original photos open? It might be fit for a
| good Kaggle competition, although maybe a little too late for
| this current election.
| jasonjayr wrote:
| From the article, it seems like the rush was to collect enough
| evidence to file a challenge within the legal timeframe. With a
| challenge filed, it seems like there is a bit more time to
| verify claims + other evidence. (I know nothing of the system
| of government there, but) -- it seems like the prudent thing to
| do would be for the courts to mandate a neutral verification of
| each of those paper sheets. (ie, 10 trusted representatives
| from each party re-key the figures manually).
| olabyne wrote:
| If you want, you have exactly the same issue to solve with
| Kenya last year.
|
| The pictures of all of the voting sites are available, but the
| country went to chaos to pick a winner. It is crazy , because
| on the lower level (in voting offices), the vote process was
| respected and the numbers are trustworthy, but the higher you
| go and the more corruption happens, as each aggregation of data
| removes trust to the system.
| mattlutze wrote:
| This was thrilling.
|
| Sometimes, one person's bug is another person's feature :)
| thread_id wrote:
| Fantastic story. What an excellent example of democratization
| from technology. And also a perfect example of how the blade cuts
| both ways. Digital warriors battling it out in real time and the
| stakes are enormous. Great respect for Mark and his ingenuity and
| adaptive responses!!!!
| tr33house wrote:
| I'd tried something like this with the Kenyan election but our
| setup was to use OCR (google cloud) -> text -> parse -> sqlite
|
| We started late so the results were out when we finished but I
| think it'll be a good idea to develop software that can parse the
| PDF results and display them faster than the electoral bodies
| can. In Kenya, and Nigeria, the delays cause a lot of anxiety
| YeGoblynQueenne wrote:
| >> We had a brainstorming meeting, and decided to try a new
| approach. We would simply ask the Obidients to help us do the
| conversion. If hundreds of Obidients did the transcription, it
| would go fast.
|
| What would guarantee that the Obidients would not, in turn, try
| to inflate the score of the Labor candidate?
| munchler wrote:
| They planned to transcribe each PDF multiple times in order to
| validate the results.
| davedx wrote:
| More background. OP is an impressive entrepreneur! Massive kudos.
| https://markessien.com/projects/hotels-ng/
| dejongh wrote:
| Wow. Wild story. Thanks for sharing. Cool twist that a bug ended
| up identifying the bad guys.
| hoseja wrote:
| Silly, you don't malcount the actual votes, you brainwash the
| population and pervert the process until they vote the way you
| want them to, like in the advanced first world democracies.
| avodonosov wrote:
| That's not the worst case, if wise elite brainwashes
| (manufactures consent of) the population.
|
| Worse is when the elite is not so wise (sometimes plainly
| crazy), or the elite loses control to crazy people,
| adversaries. Or self-induced mass hysteria of the population.
|
| The direct "democracy" that very soon will inevitably be
| enabled by technology, poses great dangers in the situation
| where masses are so easily manupulateable, and their collective
| intelligence seems not raising above individual level, but
| degrading below it for some reason. Violent chaos, lynch
| courts, etc.
| mmmuhd wrote:
| Elupee 75, To be frank, you did a great job and i am proud of
| someone from my country pulling this off, but the bitter truth is
| President Elect Bola Ahmed Tinibu won this election. Peter Obi's
| youth support is predominantly in the south, and Christian
| majority parts of the country, he clearly lack support in the
| Muslim north, where I am from. I voted for Kwankwaso though.
| bmsleight_ wrote:
| Can you expand on " he clearly lack support". Bonus points for
| facts over opinions.
| mmmuhd wrote:
| Clearly means even his Vice Presidential Candidate could not
| win his own polling unit, polling unit, not ward, not Local
| Government, not State.
|
| https://punchng.com/nigeriaelections2023-datti-loses-
| polling...
| vuln wrote:
| [flagged]
| sd9 wrote:
| I was naturally skeptical of the punchng article, so I
| crosschecked it against OP's CSV. The votes in the article
| do agree with OP's CSV (although the number of accredited
| voters differs slightly).
|
| The crosschecked results are in KADUNA_crosschecked on line
| 3800. The image is here: https://inec-cvr-cache.s3.eu-
| west-1.amazonaws.com/cached/res...
|
| Accredited voters: 276, Registered voters: 750, APC: 98,
| LP: 54, PDP: 102, NNPP: 11
|
| All that said, I don't think that the results for 276
| voters in one polling unit in one ward in one local
| government area in one state is clear evidence that Obi
| lacks support. If anything, the fact that OP's CSV matches
| a (potentially biased) news article gives me more faith in
| OP's tallies and claims.
|
| (Aside: it seems _easier_ to lose an election in your own
| polling unit, where variance plays a larger part, than it
| is to lose on a wider scale.)
| [deleted]
| hardlianotion wrote:
| That is a great job - well done from a grateful Nigerian.
| bundie wrote:
| I did not know that Nigerians used Hacker News :-D Most people
| I encounter on this site are oyinbos.
| hardlianotion wrote:
| We are everywhere and cannot be avoided.
| nivenkos wrote:
| This is a great example of why electronic voting is important and
| can help secure democracy.
| cwkoss wrote:
| Wouldn't electronic voting just create a means for the ruling
| party to deliver the result without releasing evidence of vote
| tampering?
|
| I don't understand what you think electronic voting solves...
| logifail wrote:
| > This is a great example of why electronic voting is important
| and can help secure democracy.
|
| If those in power are against change, I wouldn't want to have
| to put my trust in electronic voting if I was hoping for
| change.
|
| I was left with the impression that it is the _paper_ records
| in this story that led to the unravelling of an attempt to
| forge the results.
|
| Long live paper ballots.
| SkeuomorphicBee wrote:
| > I was left with the impression that it is the paper records
| in this story that led to the unravelling of an attempt to
| forge the results.
|
| The manual tallying of paper records is what lead to the
| attempt to forge the results in the first place. If the
| results were electronically tallied to generate an official
| result, then they wouldn't need to recount the whole election
| to verify the result, just doing a statistically significant
| random sampling of the polls to recount would be enough.
| logifail wrote:
| > If the results were electronically tallied to generate an
| official result
|
| Electronic voting doesn't make bad politicians less bad. In
| this instance, the bad guys were prepared to deliberately
| remove CCTV so when they sent their goons out at night to
| shoot protestors there would be no evidence.
|
| "Electronic tallies" are never going to give a free and
| fair election if those in power are prepared to go that
| far. Safer to stick with paper ballots and election
| observers equipped with Mark I eyeballs.
| pjc50 wrote:
| How do you recount electronic-only elections?
| SkeuomorphicBee wrote:
| By looking at the receipts printed by the ballot
| machines.
|
| Ballot machines print either a final tally at the end of
| the day, or print every single vote and automatically
| drop it into a physical ballot, depending on the threat
| model of the country in question. Either way the you have
| partial or total recount.
| logifail wrote:
| > By looking at the receipts printed by the ballot
| machines.
|
| Let's the clear, you're not really "recounting" the
| ballots at that point. If the machine is compromised -
| and we're discussing a situation in which we know CCTV
| was removed _and people were then shot_ - you have no
| real idea if the receipt corresponds to the voter 's
| original intent. Or, indeed, if all the receipts from all
| the voters make it as far as the recount (?)
|
| > Ballot machines print either a final tally at the end
| of the day, or print every single vote and automatically
| drop it into a physical ballot, depending on the threat
| model of the country in question.
|
| How is reprinting the final automated tally supposed to
| represent a "recount" of the original automated tally?
|
| > Either way the you have partial or total recount.
|
| You really don't. Bits of paper and Mark I eyeballs all
| the way.
|
| As Tom Scott puts it, "The key point is not is that paper
| voting is perfect - it isn't - but attacks against it
| don't scale well"[0].
|
| [0] Why Electronic Voting Is Still A Bad Idea:
| https://www.youtube.com/watch?v=LkH2r-sNjQs
| SkeuomorphicBee wrote:
| > How is reprinting the final automated tally supposed to
| represent a "recount" of the original automated tally?
|
| If you want to detect tampering in the central totalling,
| then all you need is the end of day receipt of each
| ballot. Exactly like in OP's case.
|
| If you want to detect tampering in a ballot, then you
| manually recount the individual printed paper votes
| inside that ballot. That is something that you should do
| to a random sample of ballots, plus ballots with unusual
| totals.
|
| > As Tom Scott puts it, "The key point is not is that
| paper voting is perfect - it isn't - but attacks against
| it don't scale well"[0].
|
| That is simply not true, large scale paper ballot
| tampering scales very well to the point of turning
| elections, and is much easier to pull off because it
| happens in the fringe where no one is looking (while
| tampering the electronic system would require pulling
| your heist in the IT room where everyone is looking).
| gdelfino01 wrote:
| You introduce technology to increase transparency and fight
| corruption. You increase transparency by having video
| recordings of human counting votes linked to the electronic
| record of the totals.
|
| When you introduce technology to eliminate manual counting and
| paper trails, then transparency is eliminated and you give a
| green light to fraud, corruption, very juicy contracts and
| death.
| TazeTSchnitzel wrote:
| On the contrary, eletronic voting doesn't create the paper
| trail necessary to dig up frauds like this. You can simply
| program or hack the system to report any vote total you want.
| SkeuomorphicBee wrote:
| First of all, hacking the electronic system is much much
| harder than hacking the paper process. In the case at hand
| the paper tallying process was the one hacked.
|
| And second, electronic systems can create a paper trail, just
| make the electronic machine spit out a paper receipt. Then
| you have the best of both worlds, you can have instant
| electronic totals, and then do some random sampling recounts
| of the receipts to validate the result.
| marcosdumay wrote:
| Scaling an attack against paper is incredibly difficult,
| and requires coordination in a level that is almost sure to
| trigger the law enforcement much before it can change some
| national-level numbers.
|
| Scaling an attack against a computer system is almost the
| same as doing an attack against a computer system. Few
| attacks don't scale.
|
| But yeah, if you just print the vote and push it into an
| urn (while the voter can read it), you'll get the best of
| both worlds.
| redman25 wrote:
| This might be a sensitive question but I wonder if something like
| this would work in the United States? With all of the fears of
| election interference why not trust but verify?
| charles_f wrote:
| Would you trust the recount? I mean, the only way to engage the
| number of people you need to do that kind of recount is by
| having them _very_ pissed, so most likely feeling like their
| party was wronged and therefore the thing is partisan by
| essence. If you 're on the winning party you wouldn't trust the
| numbers the others give you anyhow, so what's the point
| pjc50 wrote:
| Genuinely the US would do better if it had paper elections with
| a handcount with observers. The system works in the UK just
| fine. Unfortunately, there's a category of people in both the
| US and Nigeria who use "election interference" to mean
| "accurately counting the votes".
| pjc50 wrote:
| Striking reminder of how big the world is that while I had heard
| of #EndSARS, I hadn't realised the scale of the political
| violence in Nigeria nor that it had its own Bloody Sunday-scale
| massacre.
| prhrb wrote:
| What a scam by the ruling political party
| SergeAx wrote:
| Pdf is a very unfortunate format. It is proprietary, it is paper-
| oriented, its almost single goal is to keep precise printing
| layout. But for the last 30 years world didn't come up with
| anything that could compete.
| segfaultbuserr wrote:
| PDF isn't the actual problem in this particular case. The
| documents here are photographs taken at different camera
| angles, embedded in PDFs.
| jxramos wrote:
| I was going to say, using alt drag to select vertical columns
| is usually how I extract useable tables out from pdfs with
| embedded tables.
| londons_explore wrote:
| Isn't things like this the reason that the UN provide election
| observers?
|
| By spot checking just a random 100 votes are correctly tallied,
| you can be pretty sure the outcome of the election is legit in a
| > 10M voter country.
| Someone wrote:
| > By spot checking just a random 100 votes are correctly
| tallied
|
| How do you do that? I think the only error you could detect is
| when the tally has fewer votes for a party than what's in that
| sample. If so, a fraudster could report 100 votes for every
| party, and add the remaining to whatever party they want to
| win.
| londons_explore wrote:
| You have to design the election system with this in mind.
|
| One such design would be for every vote to have a unique id.
| When announcing the results, you also publish a list of which
| vote ids were tallied for which candidate.
|
| Then you have 100 random ids, and the checkers watch those
| votes all the way from the voter casting them to the final
| tally.
| jgtrosh wrote:
| The context should be dated to 2020, not 2023 Edit: it was now
| corrected, no need to downvote
|
| Great story! Looking forward to some follow up
| public_defender wrote:
| I don't understand. The article says the SARS protests started
| in 2020 and the election was in 2023. This seems correct.
| jgtrosh wrote:
| Yes, it was now corrected
| MontagFTB wrote:
| So the bug where the first voting sheet shown to a user was from
| the same 10% of the photos turned out to be a feature, serving as
| a CAPTCHA of sorts to weed out the bad actors from the good.
|
| If memory serves, some CAPTCHA techniques include showing two
| numbers to transcribe, where one's value is already known. If
| that number is transcribed incorrectly, then the other number's
| result isn't used, and the CAPTCHA fails. Perhaps a similar
| technique may have also helped here?
| Spare_account wrote:
| This approach was part of their strategy:
|
| > _Then we started showing some results we knew to the bots -
| if they entered wrong numbers, we would stop accepting the
| results._
| didgetmaster wrote:
| It seems to me that when combating bots or hackers, the wrong
| approach is to provide immediate negative feedback. Giving an
| immediate error code lets them know that their current
| strategy is not working and to try something different.
|
| It seems like a better approach would be to make them think
| you were accepting the results, when in fact they were going
| to the bit bucket. Hackers trying to get into your corporate
| database should be presented with a table full of false (but
| plausible) data rather than an error. Let them waste time
| trying to use all those fake SS numbers or account numbers
| before they figure out they got duped.
| theptip wrote:
| For sure, shadow-banning is a great strat here. Raise their
| costs, and don't give them any signal to learn from.
|
| Assuming you have the bandwidth to absorb the bot load,
| which sounded like it was an issue here.
| tetha wrote:
| As scary as it can be, but yes. It's similar to strategy
| games at a point - sometimes it's better to let the enemy
| push you around for a bit as long as nothing important is
| damaged. I don't really care if I have to scale up the LBs
| a bit to handle all of the requests for some time. However,
| this allows your attacker to commit more of their
| resources, so you can block and ban more once you react or
| so you can learn more about their behavior, so you can
| mislead, slow-lorry and generally mess with them more
| effectively.
|
| There have also been funny defcon-talks about messing with
| attackers about this, by returning all kinds of messed up
| return codes, slow-lorry'ing the bot, ... I'm kind of
| wondering if you could SSRF (or rather, CSRF) a bot like
| this by returning a redirect to e.g. the AWS metadata
| API... could be a fun topic to mess with.
| pbhjpbhj wrote:
| It's also evidence of a crime. I wonder how that relates:
| if you just drop those entries from the database (or from
| the app prior to entry into the main db) then that seems
| like destruction of evidence of a crime?
|
| It seems one should record all entries, but only update a
| canonical db if all entries fail to trip automated
| tampering detections.
| malborodog wrote:
| Can you explain that again differently? I didn't understand
| that captcha point. It feels important though.
| wodenokoto wrote:
| Original captcha was built around transcribing text that ocr
| tools failed at
|
| So I give you two words to transcribe to prove you are human.
| I know one of them and I want to know the other.
| czx4f4bd wrote:
| I think they're referring to the old reCAPTCHA v1 approach.
|
| From https://en.wikipedia.org/wiki/ReCAPTCHA:
|
| > The original iteration of the service was a mass
| collaboration platform designed for the digitization of
| books, particularly those that were too illegible to be
| scanned by computers. The verification prompts utilized pairs
| of words from scanned pages, with one known word used as a
| control for verification, and the second used to crowdsource
| the reading of an uncertain word.
| dan-robertson wrote:
| I think the bug was that your first sheet came from a small set
| and the people entering bad data would refresh instead of doing
| the actually random next sheet, so entries for most of the
| sheets came only from people who had long sessions who were
| apparently more likely to enter good data.
| churchill wrote:
| Oh, and Mark didn't mention that Bola Ahmed Tinubu was indicted
| for heroin charges in the US in 2003, forfeited $460k & is just
| too old to run a democracy this size.
|
| Atiku Abubakar (second candidate) was a former VP and the
| president he served under (Obasanjo) still insists the dude
| remains a monument to corruption.
|
| There's been a coordinated campaign at all levels to rig this
| election massively and we saw voter intimidation, manipulation in
| broad daylight, and the acquiescence of foreign governments to it
| all.
| churchill wrote:
| Proofs:
|
| To explain the $460k he forfeited to the feds for his heroin
| trafficking indictment [0][1], Tinubu claims to have worked at
| Deloitte as a consultant & made $850k in pre-tax bonuses a
| year. Problem is, Deloitte claims he's never worked for them
| [2] and a director at Deloitte earns $340k, according to
| Glassdoor [3].
|
| [0]: https://www.bbc.com/news/world-africa-61732548 [1]:
| https://www.scribd.com/document/345742027/Bola-Tinubu-Heroin
| [2]: https://pbs.twimg.com/media/FhhgxX2WQAAWOVo?format=jpg
| [3]: https://www.glassdoor.com/Salary/Deloitte-Director-
| Salaries-...
| JumpCrisscross wrote:
| > _a director at Deloitte earns $340k, according to
| Glassdoor_
|
| This in no way undermines your post, broadly. But narrowly,
| these are sales roles. Two people with the same title at
| Deloitte can make vastly different incomes depending on their
| production.
| themitigating wrote:
| Proof?
| charles_f wrote:
| > run a democracy this size.
|
| From the looks of it, if he runs it, it won't be a democracy
| bschne wrote:
| > is just too old to run a democracy this size
|
| Ahem, somebody tell the U.S. that
| lostlogin wrote:
| > is just too old to run a democracy this size.
|
| Bola Ahmed Tinubu was born 29 March 1952. He is 70.
|
| Joe Biden was Born November 20, 1942. He is 80.
|
| There are plenty of world leaders that are old and I completely
| agree with you. Why aren't there upper age limits? The UK House
| of Lords, US Congress and US Supreme Court have this problem
| too.
| churchill wrote:
| He claims to be 70 but it's been disputed widely - I don't
| have the energy to filter signal from noise though.
| churchill wrote:
| I meant _heroin trafficking_
| mmmuhd wrote:
| [flagged]
| mrtksn wrote:
| It's pretty easy to find articles about it on Bing Chat.
|
| https://businessday.ng/news/article/u-s-court-judgement-
| indi...
|
| Also this appears to be the Indictment document:
| https://www.scribd.com/document/580028043/Bola-Ahmed-
| Tinubu-...
|
| Considering the needlessly passive aggressive tone, I would
| assume you are a supporter. Maybe it can be more useful
| conversation if you write your perspective on the matter
| instead of demanding easy to find articles about the Bola
| Ahmed Tinubu Heroin Trafficking Indictment?
| churchill wrote:
| Why not debunk everything I just wrote instead of attacking
| me personally?
|
| Google is your friend and you can verify everything I said
| about:
|
| Tinubu's drug trafficking indictment:
| https://www.bbc.com/news/world-africa-61732548
| nimajneb wrote:
| [dead]
| klooney wrote:
| Also, this is ridiculous
|
| > he became an "instant millionaire" while working as an
| auditor at Deloitte and Touche.
| churchill wrote:
| Deloitte denies having a record of ever employing him,
| like you can see here [0].
|
| [0]: https://pbs.twimg.com/media/FhhvN-
| fXEAAOTOK?format=jpg&name=...
|
| Tinubu claimed to be making $850k in annual pre-tax
| bonuses working for Deloitte. Today, Directors at
| Deloitte make 340k total comp annually, according to
| Glassdoor, and that's before you factor in inflation.
| What type of joke is this?
| mmmuhd wrote:
| churchill I am not attacking you, I am just drawing your
| attention to bring solid evidence. the link you provided,
| I couldn't find where the article states that Tinibu is
| accused of Drug trafficking or Shettima Terrorism.
| smcl wrote:
| From the linked article:
|
| > While the court confirmed it had cause to believe the
| money in the bank accounts were the proceeds of drug
| trafficking
| natpalmer1776 wrote:
| Disclaimer: Not my monkey, not my circus.
|
| That being said, your comment came off as needlessly
| aggressive to someone who knows nothing of these people
| or politics.
| favaq wrote:
| [flagged]
| rqtwteye wrote:
| I still don't understand how we ended up with PDF as sort of
| standard to archive data. PDF is already pretty bad for things
| like manuals but for things like spreadsheets we basically
| collect the data, then we destroy all the structure by putting it
| in into POF, and later on we painstakingly try to restore the
| data from PDF which is often almost impossible to do with
| accuracy.
|
| It just shows that bad solutions often win.
| andrewio wrote:
| Try https://parsio.io.
|
| It converts PDFs into a structured JSON format that you can
| export anywhere using a Zapier or Make automation:
| manv1 wrote:
| Back in the day there were at least two programs competing for
| the role that PDF fills today that I remember: diskpaper and
| PDF. Apple also had one for its developer docs, but it was
| never released commercially, I believe.
|
| PDF provided more fidelity for printing, had better tooling (it
| was by Adobe after all), it was cross-platform, could be
| displayed on the desktop, so it won. The reader was cross-
| platform so end-users didn't have to mess with installing
| plugins for various image types. And because everyone in the
| document creation division(1) used Postscript to print,
| printing to PDF was super-easy. And at some point everyone had
| a postscript printer driver on their machine, so printing to
| PDF because super-easy as well.
|
| It's not an archiving tool, but people use it for
| archiving...just like the way a spreadsheet isn't a project
| management tool, but millions of people use it for project
| management.
|
| At this point the network effects for the PDF file format would
| make it difficult to replace. With PDF you can practically
| guarantee(2) that the file will look the same on any device.
|
| (1) This was more true back then than today, probably (2)
| assuming that you embedded the fonts, and that the reader
| doesn't suck.
|
| What's funny is I don't think Adobe really makes any money off
| of PDF; it's an accidental de-facto standard.
| lostlogin wrote:
| > PDF provided more fidelity for printing, had better tooling
|
| This might have been true once, but using Acrobat now is so
| painful. Of all the apps that work, Apples Preview is my
| editor of choice and when I'm on Windows I really miss it.
| layer8 wrote:
| > how we ended up with PDF as sort of standard to archive data.
|
| I don't think we really did. They are a standard for archiving
| typeset page-based documents.
|
| Of course, paper documents used to be standard for archiving
| data, and some continue to do so in the form of PDF.
|
| In principle, it is possible to integrate all the structure you
| want in a PDF (using Marked Content, Structure Attributes and
| User Properties), but for data (as opposed to document
| structure) you'd need custom software to generate and interpret
| that.
| varenc wrote:
| For this particular case, the use of PDFs seems irrelevant.
| Photos were just taken of each polling unit's results. These
| photos happened to then be embedded into PDFs for distribution,
| but the core underlying data is just an image embedded into
| that PDF. No important data was destroyed when these photos
| were placed into PDFs.
| spacebanana7 wrote:
| I've thought about this and come round to think that the flaws
| of PDF are actually essential to the success of the document
| format.
|
| - Non-responsive (compared to HTML). Allows PDFs to serve as a
| common standard between other document formats with different
| resizing logic, like Latex and Word.
|
| - Difficultly of network access from code running inside
| document. Allows PDFs to generally operate offline. Nobody's
| brave enough to try to write a single page application in a PDF
|
| - Destroying data structure. Allows forward compatibility with
| anything that can be displayed statically on a screen. New
| applications can have different ideas about how tables, text or
| charts should work but if there's static visual output then
| it'll convert to PDF. Awareness of say, the structure of tables
| is precisely what makes it so difficult for say google sheets
| and excel to stay compatible with each other's new table
| features. If somebody develops a new language with new
| characters not even in Unicode it'll still work on a PDF
|
| It's also worth noting that most PDF limitations have the
| characteristic of making things hard but not absolutely
| impossible. These escape hatches prevent people with hard
| requirements from actually moving to a new format.
|
| If it were truly impossible to get invoice data from PDFs
| people might've shifted to a different format for business
| transactions. But if it's merely difficult some company will
| come up with an API that works as a good enough extraction
| solution whose cost is justified by the other compatibility
| benefits of PDFs, so the ecosystem stays with PDFs.
| zo1 wrote:
| Oh but there is:
|
| https://en.wikipedia.org/wiki/Apache_Flex
|
| Not sure if I linked to the right article, but it was
| basically compiled scripts/code that was embedded into PDF's
| that could run arbitrary code.
|
| ""Apache Flex, formerly Adobe Flex, is a software development
| kit (SDK) for the development and deployment of cross-
| platform rich web applications based on the Adobe Flash
| platform.""
| salawat wrote:
| >Difficultly of network access from code running inside
| document. Allows PDFs to generally operate offline. Nobody's
| brave enough to try to write a single page application in a
| PDF.
|
| You can absolutely do so. Most times however, the desire is
| to embed the latest cut of info into the PDF, then hand it
| off to somebody who will not have network access.
|
| t. Been there, done that. Had the end product thrown out
| because of Adobe's licensing terms. I also met one of the
| people responsible for the tooling I had to suffer through. I
| have their address, but they apologized, and explained the
| internal politics at the time; so I've chilled on the whole
| _crushing their genitalia with a large wrench_ bit.
|
| Long story short: doable, _but Do Not Follow.
| This is not a place of honor. No great deed was once
| commemorated here That which remains is repulsive to
| us, in our time, as it will be in yours.
|
| Seriously. If I could fill this post with spikes and sick
| faces, I would. Vvvvvvvvvvvvvvvvvvvvvvvvvvvvv
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| XFA was the dream of madmen, and sadists, that decent men
| thought they could wrangle some positive utility out of. They
| were wrong.
|
| The trefoil is not an angel. The weird ring things are
| symbols for infectious waste._
| davedx wrote:
| It depends. There are PDFs with rasterized images of text (like
| in the article, when it's a scan or photo of a document), then
| there are PDFs with vector positioned text runs (when it's
| usually a result of some digital process). The latter are way
| easier to process than the former.
| codeulike wrote:
| these are just photos embedded in a PDF, which actually isn't
| that bad an idea, because it lets you scan multiple pages and
| join them together as a 'document'
|
| (not sure if the documents in OP had several pages, but if
| you've scanned/photographed a multi-page document, PDF is not
| that bad of a solution)
| SilverCode wrote:
| A better option would be to use the TIFF format. You can use
| it as a container format to store lossless and lossy image
| formats, and handles multiple images in a single container.
|
| It was the standard for scanners until PDF seemed to dominate
| the scene.
| adzm wrote:
| Except who knows if your application that supports TIFF
| files actually supports the features you want (multiple
| images, the compression format, etc)
| MichaelZuo wrote:
| Reminds me of USB-C.
| hunter2_ wrote:
| > It was the standard for scanners until PDF seemed to
| dominate the scene.
|
| Probably because it's much easier (for average users with
| few tools and skills) to print a PDF than to print any sort
| of non-page-based (e.g., image) file format and have the
| resulting sheet of paper match the scanned sheet of paper
| in terms of scale, orientation, position -- assuming both
| sheets are the same dimensions. Essentially using the file
| as an intermediary for physical copying of standard paper
| documents.
| rqtwteye wrote:
| I can buy the printing argument. The problem with PDF is
| that this print-optimized is used more and more for
| purposes where it will never get printed. For example
| most manuals will never get printed but they are
| published in PDF format which is a PITA to use on a phone
| and hard to search.
| gus_massa wrote:
| I'm a teacher in the first year of the university. During the
| remote classes in the pandemic, we made almost mandatory to
| upload the photos of the take homes and questions using
| camscaner [1].
|
| The student just download the app, and it fix the
| orientation, rotation, bad light, contrast, and many other
| horrible things that a jpg may have. In particular the
| orientation and ordering multiple sheets. Also, Moodle has a
| little more support for pdf than jpg [2].
|
| I don't know how many three letter agencies are reading the
| stream, but I'm happy that many three letter agencies
| operative now have a better formation in algebra and
| calculus.
|
| [1] https://www.camscanner.com/
|
| [2] It depends on how many optional packages your sysadmin
| installed.
| chrisfinazzo wrote:
| It's old, and sometimes things don't come out right, but this
| is one way out of that hornet's nest.
|
| https://tabula.technology
|
| There's also a CLI if that is more to your liking. If that
| doesn't do it, there's always the brute-force option of
| scripting in your language of choice to pull the data out.
| anigbrowl wrote:
| Because PDF shows you a page on screen that _will_ look the
| same if you print it out, and print layouts have been optimized
| for reading convenience over centuries. And if you give someone
| with no technical expertise a pdf file, it 's virtually certain
| that they're going to be able to open it because some kind of
| viewer is built into most operating systems.
|
| You're totally right about PDF being a massive pain in the butt
| for any other purpose, but unless you have an alternative that
| handles the basic use case at least as well and other use cases
| way better, PDF is here to stay.
| snvzz wrote:
| Not providing CSV is at the level of criminal negligence.
| clipper_janosch wrote:
| What an exceptional story. You are a legend.
| throwaway81523 wrote:
| I've done stuff like this semi manually. Use pdftotext to get the
| text tables out of the pdf, eyeball it and massage with emacs
| keyboard macros, and in some cases python scripts. It's not that
| big a deal but it is somewhat ad hoc.
|
| I know that OCR software is able to read stuff like magazine
| articles and figure out column layout, embedded charts, etc. It's
| weird if is nothing to do that with a pdf. Maybe I'll look around
| or see if I can hack up something.
| infinityio wrote:
| unfortunately in this case the text content was handwritten,
| not computer-generated
| harvey9 wrote:
| This is some compelling writing. I know this has real life
| implications for real people so I hope it's not in poor taste to
| say it would make a good movie.
| cwkoss wrote:
| I agree, but still needs an ending! Will this be a story of
| triumph or tragedy?
| crazygringo wrote:
| First of all, what a fantastic and inspiring read.
|
| But, I'm left greatly confused -- the article never states
| whether this changed the result.
|
| It says that halfway through counting Obi was in the lead, but
| nothing about when finished counting.
|
| And when I look at the spreadsheet, the last row (#3380) appears
| to be the totals, which lists: APC LP PDP
| NNPP 149014 85748 329030 8305
|
| Which shows LP (Obi) in third place, just like the official
| results.
|
| So what point is the article trying to make at the end of the
| day? Or have I misunderstood the numbers?
| error503 wrote:
| I collected all the _crosschecked CSVs and got:
| LP PDP APC NNPP 4731127 4555334 5928825
| 1019045
|
| Obi seems to make second place here, but far from first.
|
| https://i.imgur.com/UaZbXz6.png
| karagenit wrote:
| I totaled up the results from only the "crosschecked" CSV
| files, here's what I saw: APC: 5928825
| LP: 4731127 PDP: 4555334 NNPP: 1019045
|
| I tried to manually verify about a dozen rows myself, half were
| so blurry/low res they were illegible but the ones that were
| legible were all correct.
|
| And for the "unsure" CSVs: APC: 1308067
| LP: 578482 PDP: 736183 NNPP: 513245
|
| Also checked about a dozen, and all but one of them were wildly
| inaccurate so I wouldn't trust these much.
| sd9 wrote:
| Those are the results for just one state, Adamawa.
|
| However, like you I don't know what the overall results are; I
| agree that the article could make this clearer.
| crazygringo wrote:
| Oh thanks for clarifying. Turns out the link to the folder
| for _all_ the states is here:
|
| https://drive.google.com/drive/folders/173oHgms6wYy5WKz_i3Lh.
| ..
|
| But there doesn't appear to be any file that calculates the
| nationwide totals.
|
| It just seems like such a strange omission but I'm on mobile
| and can't add up the numbers from across a ton of different
| files myself.
| didgetmaster wrote:
| I downloaded all the .CSV files from that site and quickly
| loaded them into a table. It just took a couple minutes,
| but I didn't stop to verify that there were not duplicate
| rows across the various files.
|
| When I added up the totals, I got: APC - 7,225,399 LP -
| 5,286,181 PDP - 5,285,900 NNPP - 1,529,575
|
| Note: I was using a beta version of a new database tool I
| created to do this.
| londons_explore wrote:
| The votes surprise me... In many regions one party gets 90+% of
| the vote.
|
| Assuming the numbers are correct, then it suggests that most
| people are easily swayed by their local peers.
|
| Is that common in say the USA?
| muyuu wrote:
| It happens in the US too. Tribalism and ideological clustering
| are so similar, they are being used interchangeably these days.
| But in some traditional countries there are literal clans and
| tribes voting in blocks.
| anigbrowl wrote:
| Yep, bloc voting can be habitual or strategic. There's a town
| in Northern California where the majority of the seats on the
| council is held by people who all happen to attend the same
| megachurch.
| mmmuhd wrote:
| Exactly! and this mostly happened in the regions where the OP's
| preferred candidate won. This is clear scam.
| crazygringo wrote:
| > _it suggests that most people are easily swayed by their
| local peers._
|
| That feels like a particularly uncharitable interpretation to
| me.
|
| I think it's more along the lines of that parties and their
| policies have very different impacts on different regions. So
| it makes sense to vote on what is beneficial to your region,
| and a lot of people will agree on that.
|
| So it's not about susceptibility to being "swayed", but genuine
| policy affecting regions differently.
| orf wrote:
| Fantastic story! Did the results get used in a claim?
| seventytwo wrote:
| Wow, this was a fantastic read!
|
| I have no idea what's going on in Nigeria, but I hope the truth
| (whatever it is) will prevail!
| vincheezel wrote:
| I hope for (but do not expect) a positive outcome
| blntechie wrote:
| What was the final result numbers from the transcription?
| toyg wrote:
| They're probably going to be similar to the 14k sample he
| tweeted: a solidified Labour Party getting 50-55% of the votes,
| and the establishment candidates splitting the rest.
| churchill wrote:
| -
| churchill wrote:
| -
| mmmuhd wrote:
| David Hundeyin is a deceitful, lying criminal, so don't bring
| his "Content" as any kind of evidence.
|
| https://www.icirnigeria.org/controversy-as-oxford-
| terminates...
| [deleted]
| pxc wrote:
| @dang are these '-' comments an attempt to evade showdead?
___________________________________________________________________
(page generated 2023-03-23 23:00 UTC)