[HN Gopher] Converting untrusted PDFs into trusted ones: The Qub...
       ___________________________________________________________________
        
       Converting untrusted PDFs into trusted ones: The Qubes Way (2013)
        
       Author : transpute
       Score  : 57 points
       Date   : 2024-12-12 18:33 UTC (4 hours ago)
        
 (HTM) web link (blog.invisiblethings.org)
 (TXT) w3m dump (blog.invisiblethings.org)
        
       | aspenmayer wrote:
       | Related:
       | 
       | https://dangerzone.rocks/
       | 
       | https://github.com/freedomofpress/dangerzone
       | 
       | > Take potentially dangerous PDFs, office documents, or images
       | and convert them to safe PDFs.
       | 
       | From the learn more about page:
       | 
       | > Dangerzone was inspired by TrustedPDF but it works in non-Qubes
       | operating systems, which is important, because most of the
       | journalists I know use Macs and probably won't be jumping to
       | Qubes for some time.
       | 
       | > It uses gVisor sandboxes running in Linux containers to open
       | dangerous documents, instead of virtual machines. And it also
       | adds some features that TrustedPDF doesn't have: it works with
       | any office documents, not just PDFs; it uses optical character
       | recognition (OCR) to make the safe PDF have a searchable text
       | layer; and it compresses the final safe PDF.
       | 
       | Previously (announcement and details of gVisor sandboxing etc):
       | 
       |  _Safe Ride into the Dangerzone: Reducing Attack Surface with
       | GVisor_
       | 
       | https://news.ycombinator.com/item?id=41630076
        
         | prophesi wrote:
         | I appreciate this! It would add another attack vector, but I
         | could see the utility in SaaS'ifying this for servers to
         | convert user-uploaded content on-the-fly.
        
         | mjevans wrote:
         | This looks better than q(ubes)-pdf, but still not ideal.
         | 
         | Seems that PyMuPDF is used with a fixed (single pathname)
         | "/tmp/input_file" ?
         | https://github.com/freedomofpress/dangerzone/blob/main/dange...
         | 
         | Everything else is tossed through LibreOffice.
         | 
         | Meanwhile what I'd prefer for PDFs is some allow-listed set of
         | 'safe' PDF operations (layout image, layout text) to be used
         | with sanitized inputs (no underflows, overflows, corruption,
         | etc), and the results of any at snapshot-runtime code evaluated
         | and then flattened out to a safe element. Image OCR could be
         | run atop that.
         | 
         | Similarly it'd be nice if a filter like that existed for the
         | other documents, but as an individual contributor I don't have
         | the human power to keep up with that goal and would take the
         | same low hanging fruit worse but secure output route.
        
           | kccqzy wrote:
           | > allow-listed set of 'safe' PDF operations (layout image
           | 
           | That's the problem right there. PDF supports many image
           | formats, including ones that are useful but you may have
           | never heard of like JBIG2 for scanned documents. And the
           | parser for those image formats needed to be secure as well.
           | One very famous exploit is just exploiting JBIG2 (among other
           | things):
           | https://googleprojectzero.blogspot.com/2021/12/a-deep-
           | dive-i...
        
           | lima wrote:
           | In terms of threat model, what is the problem with MuPDF in
           | gVisor (a very tight sandbox)? Obviously, a memory-safe
           | language would be ideal, but there's nothing fundamentally
           | wrong with the approach.
        
       | mjevans wrote:
       | It looks like the qpdf-converter source, along with everything
       | else, is now on Github according to the Developer / Source Code
       | links on the site.
       | 
       | https://github.com/QubesOS/qubes-app-linux-pdf-converter
       | 
       | Their source code seems to take the most obvious path... flatten
       | it to an image printout then possibly do more?
       | https://github.com/QubesOS/qubes-app-linux-pdf-converter/blo...
       | https://github.com/QubesOS/qubes-app-linux-pdf-converter/blo...
       | 
       | Though at a quick skim I can't see any OCR steps.
        
         | mannykannot wrote:
         | I was wondering that myself, but one of the downsides mentioned
         | is that you lose text search, which seems to suggest that OCR
         | is not being used.
        
       | nickpsecurity wrote:
       | This is a good approach. It's an old, design pattern in high-
       | assurance systems where a gateway converts things into usable,
       | safer form. Another concept, often called LANGSEC, is generating
       | parsers from simple grammars that are hopefully bulletproof.
       | These ideas can be combined.
       | 
       | Two more things can happen.
       | 
       | The increasing volume of memory-safe utilities means they can be
       | used on one or both sides of this. That might prevent the exploit
       | entirely. If a memory-safe CPU, it can still help to isolate in
       | case of hardware failures (esp bitflips).
       | 
       | It can also be used to boost performance in non-Qubes systems
       | where a secure (or OSS) processor is in use. They're often slower
       | than commodity CPU's. So, one can use the disposable VM's on
       | commodity CPU's to filter data (block most attacks), transform
       | it, and send it over simple, wire protocol. Commodity VM's might
       | also present it back to the user in dressed up form.
       | 
       | Outside of security, a long time ago, they were doing similar
       | things to decrease latency and boost bandwidth on Beowulf
       | clusters. A team made Fast (or Active?) Messages to eliminate
       | TCP/IP as a bottleneck. So, sometimes a security technique can
       | also be a performance booster.
        
       | dang wrote:
       | Related:
       | 
       |  _Converting untrusted PDFs into trusted ones: The Qubes Way
       | (2013)_ - https://news.ycombinator.com/item?id=10538888 - Nov
       | 2015 (5 comments)
        
       | lysace wrote:
       | I wonder how Google handles this. Thousands of their software
       | people will need to read PDFs from all over the web from work
       | machines.
        
         | bangaladore wrote:
         | Are PDF parsers really so bad nowadays (this article is over 10
         | years old), that opening a PDF opens you up to vulnerabilities?
         | 
         | The author made this seem like such a fundamental issue. Is
         | that because PDFs natively have support for say executing code
         | (i doubt) or accessing the filesystem (i doubt), etc...
        
           | machinestops wrote:
           | PDFs support JavaScript. Here's Adobe's guide on how to add
           | JS to your PDFs:
           | https://helpx.adobe.com/uk/acrobat/using/applying-actions-
           | sc...
        
       ___________________________________________________________________
       (page generated 2024-12-12 23:00 UTC)