[HN Gopher] PDF/A-3, PDF for Long-Term Preservation, Use of ISO ...
       ___________________________________________________________________
        
       PDF/A-3, PDF for Long-Term Preservation, Use of ISO 32000-1...
       (2020)
        
       Author : gabrielsroka
       Score  : 40 points
       Date   : 2023-03-21 19:00 UTC (4 hours ago)
        
 (HTM) web link (www.loc.gov)
 (TXT) w3m dump (www.loc.gov)
        
       | jimjimjim wrote:
       | background info if useful:
       | 
       | PDF/A is a specification that limits what features of PDF are
       | allowed. The purpose is to not allow features that may be
       | problematic for archiving.
       | 
       | Initially PDF/A was really strict and prevented things like
       | transparencies since they affected reproducibility when printing
       | and embedded files etc.
       | 
       | Then people requested less restricted versions to allow more
       | archiving use cases.
       | 
       | But even the newer less restrictive versions have a more well
       | defined and verifiable specification than the main pdf
       | specification.
        
         | chasil wrote:
         | The big restriction is that the classic Postscript typefaces
         | are not available (no Times, Helvetica, or Zapf Dingbats), and
         | the PDF file must bundle any fonts it uses.
         | 
         | The pdfsizeopt package will make any PDF smaller, and I think
         | it deletes letters/characters from the included font that are
         | not used.
         | 
         | https://github.com/pts/pdfsizeopt
        
       | brookst wrote:
       | Preserving PDFs for future generations is like preserving
       | radioactive waste for them. It's inevitable they'll end up with
       | lots, and they won't thank us for it, but we should at least try
       | to contain the mess.
        
         | cm2187 wrote:
         | I love the idea of the martians having invaded earth, wiped out
         | humanity, but when they opened their first pdf file, got all
         | their files crypto-locked. A modern version of War of the
         | Worlds.
        
       | tannhaeuser wrote:
       | I'm not really getting it, aren't RFCs written in a
       | straightforward Wiki syntax? Then why would they be preserved
       | using PDF, and how is XML the source format, or would be
       | considered useful as the canonical or authoring format when the
       | existence of thousands of RFCs in plain text/light Wiki syntax
       | clearly says otherwise?
        
         | gabrielsroka wrote:
         | I think the FAQ I linked to below addresses some of these
         | https://www.rfc-editor.org/rse/format-faq/
        
       | shfiuewgieug wrote:
       | [flagged]
        
       | shfiuewgieug wrote:
       | [flagged]
        
       | gabrielsroka wrote:
       | See also https://www.rfc-editor.org/rse/format-faq/
        
       | makkesk8 wrote:
       | if only there were an open source and easy to use pdf library
       | with pdf/a support :/
        
         | gabrielsroka wrote:
         | It looks like the IETF has some tools for this. A quick search
         | revealed https://github.com/ietf-tools/ietf-at
        
         | zokier wrote:
         | The mentioned RFC PDFs are generated with Weasyprint which
         | gained PDF/A support apparently last year.
         | https://www.courtbouillon.org/blog/00028-weasyprint-56
        
       | ggm wrote:
       | Embedding the input formatting directives is neat!
        
       | zokier wrote:
       | Reading the descriptions of A-3 and A-4, to me it sounds like
       | PDF/A jumped the shark and for archival purposes the old A-2
       | might still be the best variant.
       | 
       | In general, embedding files in PDF is kinda neat capability, like
       | the example of having (CSV) dataset embedded in report or
       | something like that. But at the same time I get the feeling that
       | its an indication of general shortcoming of our file handling
       | that it makes sense to use PDF as a container format. ZIP files
       | and such are pretty crude formats for higher-level file bundles
       | and the UX falls short too.
        
       | tpmx wrote:
       | [flagged]
        
         | giantrobot wrote:
         | The PDF/A versions are subsets of PDF specs that are
         | specifically aimed at archiving. They forbid features like
         | encryption and font linking which would affect access years or
         | decades from now.
        
         | maxerickson wrote:
         | Oh wah.
         | 
         | Anyway, they do actually have a "what for" at the link:
         | https://www.loc.gov/preservation/digital/formats/fdd/fdd0003...
        
         | cookiengineer wrote:
         | > very portable C implementations
         | 
         | Did you mean "very portable exploitable implementations"?
         | 
         | Sorry, but claiming PDF is stable is absurd to say the least.
         | Any mobile, smartphone, or gaming console usually was exploited
         | because of PDF parsers before pdf.js got embedded in web
         | browsers.
         | 
         | Windows' biggest attack surface is still outlook and PDF files.
         | 
         | So I'd argue that PDF has a too large attack surface, which
         | must be reduced for better archiving purposes without side
         | effects.
        
           | [deleted]
        
         | [deleted]
        
       | gabrielsroka wrote:
       | Full title is "PDF/A-3, PDF for Long-term Preservation, Use of
       | ISO 32000-1, With Embedded Files"
       | 
       | > the new publishing framework, known as "V3", for RFCs from the
       | IETF (Internet Engineering Task Force). V3 uses an XML document
       | as the master format from which plain text, HTML, and PDF
       | versions are derived. The PDF is a PDF/A-3u document with the XML
       | master embedded. The first RFC published in the new format was
       | RFC 8650 [0], published in November 2019. For more background on
       | this choice, see RFC 7995: PDF Format for RFCs (December 2016)
       | [1] and additional Useful References below.
       | 
       | [NOTE, I changed the links below slightly to point to the actual
       | new HTML format]
       | 
       | [0] https://www.rfc-editor.org/rfc/rfc8650.html
       | 
       | [1] https://www.rfc-editor.org/rfc/rfc7995.html
        
       ___________________________________________________________________
       (page generated 2023-03-21 23:01 UTC)