[HN Gopher] How to compare two PDF documents
___________________________________________________________________
How to compare two PDF documents
Author : ingve
Score : 26 points
Date : 2021-08-14 11:12 UTC (11 hours ago)
(HTM) web link (eclecticlight.co)
(TXT) w3m dump (eclecticlight.co)
| btown wrote:
| https://draftable.com/compare is by far the best solution I've
| found for this, and it's a shame it's not more widely known
| about. It's not open-source, and their offline app is Windows
| only, but its ability to handle multi-page relayouts is far and
| above Acrobat's diff functionality (as the OP laments), and
| there's a free online version that's reasonably secure so long as
| you don't share the secret URL around. I've used it many times to
| obtain readable redlines when only given successive "baked"
| versions of a document, and it's a really useful tool for any B2B
| startup founder.
| mckmk wrote:
| Just to throw in another solution for anyone looking. Abbyy
| FineReader has a comparison module that is excellent.
| https://pdf.abbyy.com/how-to/compare-documents/
| pronoiac wrote:
| I recently read a blog post on putting the Crafting Interpreters
| book together, and there was an interesting tidbit about visual
| pdf diffs (under _Proofreading the proof_ ) -
| https://journal.stuffwithstuff.com/2021/07/29/640-pages-in-1...
| redman25 wrote:
| For visually comparing PDFs instead of textually comparing them,
| I use https://parepdf.com I work in publishing, so comparing
| printer proof PDFs is something we do regularly.
| untoxicness wrote:
| #! /bin/bash pdf_one="$1" pdf_two="$2"
| text_one=$(mktemp) text_two=$(mktemp) pdftotext
| "$pdf_one" "$text_one" pdftotext "$pdf_two" "$text_two"
| diff "$text_one" "$text_two"
| seoaeu wrote:
| Won't that fail terribly if the text has been re-flowed?
| halostatue wrote:
| It is entirely possible for that to present unexpected
| differences because of the way that the PDF format works. One
| can have two different PDFs that encode the same content in two
| different ways and unless `pdftotext` does virtual layout and
| then OCR-like extraction, you might end up with jumbled text or
| text in different orders.
___________________________________________________________________
(page generated 2021-08-14 23:02 UTC)