[HN Gopher] Hacking with PDF (2022)
       ___________________________________________________________________
        
       Hacking with PDF (2022)
        
       Author : lnyan
       Score  : 81 points
       Date   : 2024-08-17 13:03 UTC (9 hours ago)
        
 (HTM) web link (0xcybery.github.io)
 (TXT) w3m dump (0xcybery.github.io)
        
       | banku_brougham wrote:
       | This is a great demo, ive been concerned about all these pdfs i
       | like to read, this gives me a little more confidence about tools
       | to scan odfs for attacks.
        
       | JKCalhoun wrote:
       | FWIW, ages ago I wrote the PDFKit framework for the Mac (used by
       | Preview and the built-in PDF viewer in Safari).
       | 
       | The only exploit listed here that has a chance of working with
       | Preview/Safari (PDFKit) is the URI one -- none of the Javascript
       | exploits will work.
       | 
       | Why? I never implemented Javascript support [1].
       | 
       | Security was extremely important at Apple (there's a whole
       | security team that frequently interact with the various project
       | owners around the company, write and deploy file fuzzers, create
       | must-fix Radars around exploits found in the wild, etc.).
       | 
       | In fact though I had no idea how I would hoist a Javascript
       | runtime and I didn't really have the cycles to implement it if I
       | had known how to. Anyways we were content to support the 99% of
       | PDFs out there.
       | 
       | [1] In fact there were a few US tax documents that used very
       | simple Javascript snippets to take the values from two fields,
       | add them, and put the result in a third. Some code in PDFKit I
       | added would identify these few very simple patterns and implement
       | them sans JS runtime.
        
         | felipefar wrote:
         | Nice job! I've been wanting to write a PDF parser for learning
         | purposes, but have been put off by the quantity of files that
         | open source PDF parsers have on their repos and the different
         | tech that they need (image formats, compression formats, etc.).
         | I'll probably settle for a reasonable ratio between PDFs
         | supported/learning extracted from the project, so it's useful
         | knowing that PDFs with JS are not very widely used.
         | 
         | Also, I'm the developer of a reference management software, and
         | have naturally been thinking about what it'd take to save in
         | the PDF file metadata fields that are generally useful for
         | advanced readers and academics: original publication dates,
         | ISBNs, DOIs, edition, publisher, etc., instead of just author
         | and title.
        
           | gettalong wrote:
           | You can get a long way with only implementing the most basic
           | things of the PDF specification, like section 7. And even
           | there you don't need everything. For example, there is no
           | need to implement the CCITTFaxDecode, JBIG2Decode, DCTDecode
           | or JPXDecode filters if you don't want to get at the raw
           | pixels of the images.
           | 
           | Once you have parsing and writing of a simple PDF file going
           | (sections 7.2, 7.3, 7.4, 7.5, 7.7), add in support for
           | encryption (section 7.6). Now you are able to handle to at
           | least parse and write nearly all PDF files.
           | 
           | Then implement all the things you need gradually For example:
           | 
           | * Need support for parsing or creating the contents of a
           | page? -> sections 7.8, 8, and 9. Mind you, start out with
           | only supporting the built-in PDF fonts for creating text and
           | later add support for TrueType (easier) and OpenType (harder
           | if you need to implement the font parser yourself).
           | 
           | * Need support for annotations? -> section 12.5
           | 
           | And so on.
           | 
           | If you just need to store the metadata in the PDF, you only
           | need support for parsing and writing a PDF because this
           | usually also entails that you can modify the PDF object tree
           | which is needed for storing the metadata. However, if you
           | need to store that metadata in a way that is usable for other
           | PDF processors, you would need to store it as an XMP file and
           | creating that is yet another deep dive if you don't have an
           | XMP library available. See section 14.3.2 in the PDF spec for
           | this (btw. the latest PDF spec is available at no cost at
           | https://pdfa.org/resource/iso-32000-2/).
        
         | lysace wrote:
         | PDFKit is awesome to use. Thanks!
        
         | jahewson wrote:
         | Nice! There was an exploit in iOS Messages found last year due
         | to code that Apple had under license from Xpdf. I've wondered
         | why Apple needed that when they already had PDFKit?
        
         | bla3 wrote:
         | Do you know if it's still maintained? I have a bunch of PDFs
         | where images don't show up in Preview. I filed bugs for them,
         | but they're being ignored.
        
       | jjbinx007 wrote:
       | I've always held the opinion that viewing PDFs in something other
       | than Adobe Acrobat gives the user more of a chance of avoiding
       | such attacks... is there any credence to this or is it just
       | wishful thinking?
        
         | unanimous wrote:
         | I've tried creating a PDF Canarytoken [0] and opening it in a
         | few applications not including Adobe Acrobat. None of them
         | triggered the canary.
         | 
         | [0]: https://canarytokens.org/nest/
        
         | agumonkey wrote:
         | Acrobat implements more features than say muddy I assume. So in
         | terms of attack surface it would be riskier, But maybe they
         | have more security analysts too..
        
       ___________________________________________________________________
       (page generated 2024-08-17 23:00 UTC)