hngopher.com

       [HN Gopher] Show HN: Generate pdf with gitbook or mdbook url
       ___________________________________________________________________
        
       Show HN: Generate pdf with gitbook or mdbook url
        
       Author : lufeng
       Score  : 62 points
       Date   : 2023-11-11 14:16 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | chmod775 wrote:
       | Sometimes I feel we're never going to be rid of PDFs. Everything
       | can be converted into one, but you can't reliably convert _from_
       | it.
       | 
       | They're going to be here in 200 years.
        
         | srvmshr wrote:
         | They have become a reliable way in office & legal processes
         | around the world in terms of fixed layouts & content
         | immutability (in a sort of layman view. Knowing Acrobat Pro
         | exists & PDF editing too - but I'd argue that in majority of
         | cases its not as trivial as modifying text file or markup
         | sources, with intention to change or forge).
         | 
         | Correct me if wrong but Word/LibreOffice layouts could change
         | depending on the machine & version number - but with PDF you
         | get what you intended to show. I think that has always been the
         | winning proposition for PDF
        
           | eviks wrote:
           | Word offers content immutability with its read-only mode, not
           | sure how much layout change there is, though how much pixel
           | perfection do you need in legal processes that are mostly
           | pure text?
        
             | david_draco wrote:
             | If you have a 300-page legal document, and other sources
             | reference passages, e.g., a paragraph on page 234. It would
             | be unreliable if over the years or depending on the viewer
             | it moves to another page.
        
               | eviks wrote:
               | that's why legal documents number the paragraphs,
               | otherwise it's unreliable over the days you edit the
               | document, no need to wait years for the app to change
               | layout
        
             | gilcot wrote:
             | Does read-only mode not suffer from some global settings?
             | Also, what happens when the reading instance doesn't have
             | all the used fonts for example?
        
               | eviks wrote:
               | What global setting?
               | 
               | Nothing happens, embedded fonts feature isn't exclusive
               | to PDFs
        
             | cinntaile wrote:
             | Sometimes vital images disappear and the layout gets messed
             | up. I haven't had that happen to me with a pdf.
        
               | kibwen wrote:
               | Though keep in mind that a given PDF might not embed the
               | font that it uses (the PDF creator might not even have
               | the legal right to embed whatever font they're using), so
               | opening a PDF on other machines can cause them to be
               | messed up if those machines don't have the correct fonts
               | installed.
               | 
               | If you want literally pixel-perfect layout, you need to
               | use a raster image format like PNG.
        
             | midasuni wrote:
             | What version of word. PDF has many free implementations for
             | both reading and writing.
        
             | srvmshr wrote:
             | Ever observed how legal documents, product documentation,
             | books etc., are typeset - with exact placement of comment
             | boxes, US code reference/legal disclaimers/ barcode,
             | precise footnotes or margin notes. Many of those are meant
             | to be also machine readable when printed out. Exactness is
             | a very pressing need.
             | 
             | I would recommend taking a good look again. It might answer
             | you why it is preferred in some situation to be typeset in
             | PDF over a format where text could reflow.
             | 
             | About the immutability in Word, it seems optional & not
             | something by design. You can edit any *docx & 'Save it as'
             | back. This feature doesn't absolve immutability as a
             | principal feature.
        
           | quirino wrote:
           | https://en.m.wikipedia.org/wiki/PDF/A
           | 
           | PDF/A is made especially for archival and long term
           | preservation
        
             | ogurechny wrote:
             | It's just a useless label on the cover. PDF/A is nothing
             | but a subset of PDF without proprietary expandability
             | limited to what is considered to "work everywhere" when
             | dealing with common printed matter. It adds nothing to non-
             | existent error handling rules or parsing strategies. There
             | are 5 different ways for an object to be found
             | undefined/nil, but the specification is silent on whether
             | there's any difference in meaning or handling based on the
             | level it happens. Therefore libraries and tools do what
             | they find most suitable, and anything generated by the
             | numerous easy-to-use sites is potentially not quite the
             | same as originally uploaded.
             | 
             | PDF resembles the state of HTML years after HTML4, it
             | barely says what should happen in the best case.
        
         | eviks wrote:
         | indeed, such a sad state of affairs when one of the most
         | popular digital document formats isn't really a proper digital
         | document, but a hack to resemble the bad old paper days
        
         | bob1029 wrote:
         | I look at PDF like a virtual sheet of paper. Once you "print"
         | to it, abandon any hope of getting things back out.
         | 
         | This sounds like a bad thing, but we need _some_ kind of way to
         | say  "this is the final presentation no matter what".
        
           | maxerickson wrote:
           | The conversion tools in Acrobat recover most content pretty
           | well. It's not a one to one conversion, but it is plenty
           | usable for putting in new documents or whatever.
        
           | jowea wrote:
           | Which is bothersome when you disagree about not needing to
           | extract data or modify.
        
         | powersnail wrote:
         | I feel that there's a lot of value in how a finished PDF
         | document is visually inflexible. This is how the document
         | looks, and this will be how the document look in the next
         | generation PDF viewer on a completely different computing
         | platform of a different type of device. If it works on my
         | machine, it works on yours, too. (This is ignoring the dynamic
         | PDFs with javascripts in them)
         | 
         | It doesn't evolve with the electronic device, which means you
         | might need to zoom and pan, but it also means that it probably
         | won't be completely bungled.
         | 
         | I've bought an EPUB which only works on iPad. If the screen is
         | sized differently, or uses a different font, etc., the texts
         | are all messed up. It simply wouldn't happen if the book was
         | distributed via PDF.
        
       | zdw wrote:
       | Or you can just run the target in Sphinx, and get better
       | organized output with real table of contents and indexes - it
       | supports a huge number of output formats: https://www.sphinx-
       | doc.org/en/master/usage/builders/index.ht...
       | 
       | If you must use markdown, there's always https://mystmd.org which
       | integrates directly into Sphinx, modulo minor bits of weirdness
       | due to markdown being a mishmash of extensions.
        
         | civilitty wrote:
         | I suspect this is for creating PDFs for ChatGPT, not for
         | generating project documentation from source. That's the
         | usecase that immediately came to mind, since a lot of Rust
         | content (for example) is in mdbook format.
        
           | dragonwriter wrote:
           | But gitbook and mdbook use markdown, which ChatGPT supports,
           | too.
        
       | vagab0nd wrote:
       | Just recently I was tasked to convert some _huge_ html pages
       | (with lots of small entries) into a pdf file. The requirements
       | are  "fully automated solution" and "pdf must look the same as
       | the page when viewed in a browser". Probably takes less than five
       | minutes, right? I thought the same.
       | 
       | Wrong.
       | 
       | Chrome/Chromium crashes due to hard coded memory limit in V8.
       | 
       | Firefox has no command line option for printing pdfs.
       | 
       | No other libraries render the pdfs correctly because they are not
       | full-fledged web engines.
       | 
       | So what were my options?
       | 
       | 1. Read/understand Chromium source, recompile to lift the memory
       | limit.
       | 
       | 2. Read/understand Firefox source, recompile to add a command
       | line option.
       | 
       | 3. Use some UI testing framework to automate pdf printing in
       | Firefox.
       | 
       | Eventually I did 4, which is split the html files into smaller
       | chunks, convert and re-combine. Of course the problem is how do
       | you know where to split the html so that it's at the boundary of
       | the page? The solution is to do a binary search for the number of
       | entries to put into each chunk when the number of generated pdf
       | pages changes. What a pain.
        
         | httgp wrote:
         | Have you considered Playwright? It offers very straightforward
         | APIs to do exactly this.
        
         | constantly wrote:
         | Run the browser's native print to pdf and save the result.
        
       | thangalin wrote:
       | I developed KeenWrite[0] with similar ideas to mdbook: typeset
       | Markdown documents into PDF. Technically, this happens in three
       | stages. First, the Markdown is converted to XHTML. Second, the
       | XHTML is converted to TeX commands. Third, the ConTeXt
       | typesetting system produces a PDF file. Both the GUI and CLI can
       | export to PDF.[1] (This means that XHTML also can be converted to
       | PDF.)
       | 
       | Like mdbook, the themes are isolated. Instead of CSS, KeenWrite
       | themes are written in ConTeXt. There are several example starter
       | themes.[2] A "thesis" theme would be a nice addition, but there's
       | a problem.
       | 
       | Markdown lacks a standard for cross-references and citations. An
       | open KeenWrite issue animates a possible UX solution.[3] The
       | topic of references/citations has been discussed on CommonMark[4]
       | without much movement. Parsing cross-references and citations
       | would likely benefit all flexmark-java[5] integrations. KeenWrite
       | uses flexmark-java, but I'm otherwise unaffiliated. If anyone is
       | interested in helping, reach out (see profile).
       | 
       | [0]: https://keenwrite.com/
       | 
       | [1]:
       | https://gitlab.com/DaveJarvis/KeenWrite/-/blob/main/docs/cmd...
       | 
       | [2]: https://gitlab.com/DaveJarvis/keenwrite-themes/
       | 
       | [3]: https://gitlab.com/DaveJarvis/KeenWrite/-/issues/145
       | 
       | [4]: https://talk.commonmark.org/t/cross-references-and-
       | citations...
       | 
       | [5]: https://github.com/vsch/flexmark-java
        
       ___________________________________________________________________
       (page generated 2023-11-11 23:00 UTC)