[HN Gopher] How to compare two PDF documents
       ___________________________________________________________________
        
       How to compare two PDF documents
        
       Author : ingve
       Score  : 26 points
       Date   : 2021-08-14 11:12 UTC (11 hours ago)
        
 (HTM) web link (eclecticlight.co)
 (TXT) w3m dump (eclecticlight.co)
        
       | btown wrote:
       | https://draftable.com/compare is by far the best solution I've
       | found for this, and it's a shame it's not more widely known
       | about. It's not open-source, and their offline app is Windows
       | only, but its ability to handle multi-page relayouts is far and
       | above Acrobat's diff functionality (as the OP laments), and
       | there's a free online version that's reasonably secure so long as
       | you don't share the secret URL around. I've used it many times to
       | obtain readable redlines when only given successive "baked"
       | versions of a document, and it's a really useful tool for any B2B
       | startup founder.
        
       | mckmk wrote:
       | Just to throw in another solution for anyone looking. Abbyy
       | FineReader has a comparison module that is excellent.
       | https://pdf.abbyy.com/how-to/compare-documents/
        
       | pronoiac wrote:
       | I recently read a blog post on putting the Crafting Interpreters
       | book together, and there was an interesting tidbit about visual
       | pdf diffs (under _Proofreading the proof_ ) -
       | https://journal.stuffwithstuff.com/2021/07/29/640-pages-in-1...
        
       | redman25 wrote:
       | For visually comparing PDFs instead of textually comparing them,
       | I use https://parepdf.com I work in publishing, so comparing
       | printer proof PDFs is something we do regularly.
        
       | untoxicness wrote:
       | #! /bin/bash            pdf_one="$1"       pdf_two="$2"
       | text_one=$(mktemp)       text_two=$(mktemp)            pdftotext
       | "$pdf_one" "$text_one"       pdftotext "$pdf_two" "$text_two"
       | diff "$text_one" "$text_two"
        
         | seoaeu wrote:
         | Won't that fail terribly if the text has been re-flowed?
        
         | halostatue wrote:
         | It is entirely possible for that to present unexpected
         | differences because of the way that the PDF format works. One
         | can have two different PDFs that encode the same content in two
         | different ways and unless `pdftotext` does virtual layout and
         | then OCR-like extraction, you might end up with jumbled text or
         | text in different orders.
        
       ___________________________________________________________________
       (page generated 2021-08-14 23:02 UTC)