[HN Gopher] Converting untrusted PDFs into trusted ones: The Qub...
___________________________________________________________________
Converting untrusted PDFs into trusted ones: The Qubes Way (2013)
Author : transpute
Score : 57 points
Date : 2024-12-12 18:33 UTC (4 hours ago)
(HTM) web link (blog.invisiblethings.org)
(TXT) w3m dump (blog.invisiblethings.org)
| aspenmayer wrote:
| Related:
|
| https://dangerzone.rocks/
|
| https://github.com/freedomofpress/dangerzone
|
| > Take potentially dangerous PDFs, office documents, or images
| and convert them to safe PDFs.
|
| From the learn more about page:
|
| > Dangerzone was inspired by TrustedPDF but it works in non-Qubes
| operating systems, which is important, because most of the
| journalists I know use Macs and probably won't be jumping to
| Qubes for some time.
|
| > It uses gVisor sandboxes running in Linux containers to open
| dangerous documents, instead of virtual machines. And it also
| adds some features that TrustedPDF doesn't have: it works with
| any office documents, not just PDFs; it uses optical character
| recognition (OCR) to make the safe PDF have a searchable text
| layer; and it compresses the final safe PDF.
|
| Previously (announcement and details of gVisor sandboxing etc):
|
| _Safe Ride into the Dangerzone: Reducing Attack Surface with
| GVisor_
|
| https://news.ycombinator.com/item?id=41630076
| prophesi wrote:
| I appreciate this! It would add another attack vector, but I
| could see the utility in SaaS'ifying this for servers to
| convert user-uploaded content on-the-fly.
| mjevans wrote:
| This looks better than q(ubes)-pdf, but still not ideal.
|
| Seems that PyMuPDF is used with a fixed (single pathname)
| "/tmp/input_file" ?
| https://github.com/freedomofpress/dangerzone/blob/main/dange...
|
| Everything else is tossed through LibreOffice.
|
| Meanwhile what I'd prefer for PDFs is some allow-listed set of
| 'safe' PDF operations (layout image, layout text) to be used
| with sanitized inputs (no underflows, overflows, corruption,
| etc), and the results of any at snapshot-runtime code evaluated
| and then flattened out to a safe element. Image OCR could be
| run atop that.
|
| Similarly it'd be nice if a filter like that existed for the
| other documents, but as an individual contributor I don't have
| the human power to keep up with that goal and would take the
| same low hanging fruit worse but secure output route.
| kccqzy wrote:
| > allow-listed set of 'safe' PDF operations (layout image
|
| That's the problem right there. PDF supports many image
| formats, including ones that are useful but you may have
| never heard of like JBIG2 for scanned documents. And the
| parser for those image formats needed to be secure as well.
| One very famous exploit is just exploiting JBIG2 (among other
| things):
| https://googleprojectzero.blogspot.com/2021/12/a-deep-
| dive-i...
| lima wrote:
| In terms of threat model, what is the problem with MuPDF in
| gVisor (a very tight sandbox)? Obviously, a memory-safe
| language would be ideal, but there's nothing fundamentally
| wrong with the approach.
| mjevans wrote:
| It looks like the qpdf-converter source, along with everything
| else, is now on Github according to the Developer / Source Code
| links on the site.
|
| https://github.com/QubesOS/qubes-app-linux-pdf-converter
|
| Their source code seems to take the most obvious path... flatten
| it to an image printout then possibly do more?
| https://github.com/QubesOS/qubes-app-linux-pdf-converter/blo...
| https://github.com/QubesOS/qubes-app-linux-pdf-converter/blo...
|
| Though at a quick skim I can't see any OCR steps.
| mannykannot wrote:
| I was wondering that myself, but one of the downsides mentioned
| is that you lose text search, which seems to suggest that OCR
| is not being used.
| nickpsecurity wrote:
| This is a good approach. It's an old, design pattern in high-
| assurance systems where a gateway converts things into usable,
| safer form. Another concept, often called LANGSEC, is generating
| parsers from simple grammars that are hopefully bulletproof.
| These ideas can be combined.
|
| Two more things can happen.
|
| The increasing volume of memory-safe utilities means they can be
| used on one or both sides of this. That might prevent the exploit
| entirely. If a memory-safe CPU, it can still help to isolate in
| case of hardware failures (esp bitflips).
|
| It can also be used to boost performance in non-Qubes systems
| where a secure (or OSS) processor is in use. They're often slower
| than commodity CPU's. So, one can use the disposable VM's on
| commodity CPU's to filter data (block most attacks), transform
| it, and send it over simple, wire protocol. Commodity VM's might
| also present it back to the user in dressed up form.
|
| Outside of security, a long time ago, they were doing similar
| things to decrease latency and boost bandwidth on Beowulf
| clusters. A team made Fast (or Active?) Messages to eliminate
| TCP/IP as a bottleneck. So, sometimes a security technique can
| also be a performance booster.
| dang wrote:
| Related:
|
| _Converting untrusted PDFs into trusted ones: The Qubes Way
| (2013)_ - https://news.ycombinator.com/item?id=10538888 - Nov
| 2015 (5 comments)
| lysace wrote:
| I wonder how Google handles this. Thousands of their software
| people will need to read PDFs from all over the web from work
| machines.
| bangaladore wrote:
| Are PDF parsers really so bad nowadays (this article is over 10
| years old), that opening a PDF opens you up to vulnerabilities?
|
| The author made this seem like such a fundamental issue. Is
| that because PDFs natively have support for say executing code
| (i doubt) or accessing the filesystem (i doubt), etc...
| machinestops wrote:
| PDFs support JavaScript. Here's Adobe's guide on how to add
| JS to your PDFs:
| https://helpx.adobe.com/uk/acrobat/using/applying-actions-
| sc...
___________________________________________________________________
(page generated 2024-12-12 23:00 UTC)