[HN Gopher] Show HN: Generate pdf with gitbook or mdbook url
___________________________________________________________________
Show HN: Generate pdf with gitbook or mdbook url
Author : lufeng
Score : 62 points
Date : 2023-11-11 14:16 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| chmod775 wrote:
| Sometimes I feel we're never going to be rid of PDFs. Everything
| can be converted into one, but you can't reliably convert _from_
| it.
|
| They're going to be here in 200 years.
| srvmshr wrote:
| They have become a reliable way in office & legal processes
| around the world in terms of fixed layouts & content
| immutability (in a sort of layman view. Knowing Acrobat Pro
| exists & PDF editing too - but I'd argue that in majority of
| cases its not as trivial as modifying text file or markup
| sources, with intention to change or forge).
|
| Correct me if wrong but Word/LibreOffice layouts could change
| depending on the machine & version number - but with PDF you
| get what you intended to show. I think that has always been the
| winning proposition for PDF
| eviks wrote:
| Word offers content immutability with its read-only mode, not
| sure how much layout change there is, though how much pixel
| perfection do you need in legal processes that are mostly
| pure text?
| david_draco wrote:
| If you have a 300-page legal document, and other sources
| reference passages, e.g., a paragraph on page 234. It would
| be unreliable if over the years or depending on the viewer
| it moves to another page.
| eviks wrote:
| that's why legal documents number the paragraphs,
| otherwise it's unreliable over the days you edit the
| document, no need to wait years for the app to change
| layout
| gilcot wrote:
| Does read-only mode not suffer from some global settings?
| Also, what happens when the reading instance doesn't have
| all the used fonts for example?
| eviks wrote:
| What global setting?
|
| Nothing happens, embedded fonts feature isn't exclusive
| to PDFs
| cinntaile wrote:
| Sometimes vital images disappear and the layout gets messed
| up. I haven't had that happen to me with a pdf.
| kibwen wrote:
| Though keep in mind that a given PDF might not embed the
| font that it uses (the PDF creator might not even have
| the legal right to embed whatever font they're using), so
| opening a PDF on other machines can cause them to be
| messed up if those machines don't have the correct fonts
| installed.
|
| If you want literally pixel-perfect layout, you need to
| use a raster image format like PNG.
| midasuni wrote:
| What version of word. PDF has many free implementations for
| both reading and writing.
| srvmshr wrote:
| Ever observed how legal documents, product documentation,
| books etc., are typeset - with exact placement of comment
| boxes, US code reference/legal disclaimers/ barcode,
| precise footnotes or margin notes. Many of those are meant
| to be also machine readable when printed out. Exactness is
| a very pressing need.
|
| I would recommend taking a good look again. It might answer
| you why it is preferred in some situation to be typeset in
| PDF over a format where text could reflow.
|
| About the immutability in Word, it seems optional & not
| something by design. You can edit any *docx & 'Save it as'
| back. This feature doesn't absolve immutability as a
| principal feature.
| quirino wrote:
| https://en.m.wikipedia.org/wiki/PDF/A
|
| PDF/A is made especially for archival and long term
| preservation
| ogurechny wrote:
| It's just a useless label on the cover. PDF/A is nothing
| but a subset of PDF without proprietary expandability
| limited to what is considered to "work everywhere" when
| dealing with common printed matter. It adds nothing to non-
| existent error handling rules or parsing strategies. There
| are 5 different ways for an object to be found
| undefined/nil, but the specification is silent on whether
| there's any difference in meaning or handling based on the
| level it happens. Therefore libraries and tools do what
| they find most suitable, and anything generated by the
| numerous easy-to-use sites is potentially not quite the
| same as originally uploaded.
|
| PDF resembles the state of HTML years after HTML4, it
| barely says what should happen in the best case.
| eviks wrote:
| indeed, such a sad state of affairs when one of the most
| popular digital document formats isn't really a proper digital
| document, but a hack to resemble the bad old paper days
| bob1029 wrote:
| I look at PDF like a virtual sheet of paper. Once you "print"
| to it, abandon any hope of getting things back out.
|
| This sounds like a bad thing, but we need _some_ kind of way to
| say "this is the final presentation no matter what".
| maxerickson wrote:
| The conversion tools in Acrobat recover most content pretty
| well. It's not a one to one conversion, but it is plenty
| usable for putting in new documents or whatever.
| jowea wrote:
| Which is bothersome when you disagree about not needing to
| extract data or modify.
| powersnail wrote:
| I feel that there's a lot of value in how a finished PDF
| document is visually inflexible. This is how the document
| looks, and this will be how the document look in the next
| generation PDF viewer on a completely different computing
| platform of a different type of device. If it works on my
| machine, it works on yours, too. (This is ignoring the dynamic
| PDFs with javascripts in them)
|
| It doesn't evolve with the electronic device, which means you
| might need to zoom and pan, but it also means that it probably
| won't be completely bungled.
|
| I've bought an EPUB which only works on iPad. If the screen is
| sized differently, or uses a different font, etc., the texts
| are all messed up. It simply wouldn't happen if the book was
| distributed via PDF.
| zdw wrote:
| Or you can just run the target in Sphinx, and get better
| organized output with real table of contents and indexes - it
| supports a huge number of output formats: https://www.sphinx-
| doc.org/en/master/usage/builders/index.ht...
|
| If you must use markdown, there's always https://mystmd.org which
| integrates directly into Sphinx, modulo minor bits of weirdness
| due to markdown being a mishmash of extensions.
| civilitty wrote:
| I suspect this is for creating PDFs for ChatGPT, not for
| generating project documentation from source. That's the
| usecase that immediately came to mind, since a lot of Rust
| content (for example) is in mdbook format.
| dragonwriter wrote:
| But gitbook and mdbook use markdown, which ChatGPT supports,
| too.
| vagab0nd wrote:
| Just recently I was tasked to convert some _huge_ html pages
| (with lots of small entries) into a pdf file. The requirements
| are "fully automated solution" and "pdf must look the same as
| the page when viewed in a browser". Probably takes less than five
| minutes, right? I thought the same.
|
| Wrong.
|
| Chrome/Chromium crashes due to hard coded memory limit in V8.
|
| Firefox has no command line option for printing pdfs.
|
| No other libraries render the pdfs correctly because they are not
| full-fledged web engines.
|
| So what were my options?
|
| 1. Read/understand Chromium source, recompile to lift the memory
| limit.
|
| 2. Read/understand Firefox source, recompile to add a command
| line option.
|
| 3. Use some UI testing framework to automate pdf printing in
| Firefox.
|
| Eventually I did 4, which is split the html files into smaller
| chunks, convert and re-combine. Of course the problem is how do
| you know where to split the html so that it's at the boundary of
| the page? The solution is to do a binary search for the number of
| entries to put into each chunk when the number of generated pdf
| pages changes. What a pain.
| httgp wrote:
| Have you considered Playwright? It offers very straightforward
| APIs to do exactly this.
| constantly wrote:
| Run the browser's native print to pdf and save the result.
| thangalin wrote:
| I developed KeenWrite[0] with similar ideas to mdbook: typeset
| Markdown documents into PDF. Technically, this happens in three
| stages. First, the Markdown is converted to XHTML. Second, the
| XHTML is converted to TeX commands. Third, the ConTeXt
| typesetting system produces a PDF file. Both the GUI and CLI can
| export to PDF.[1] (This means that XHTML also can be converted to
| PDF.)
|
| Like mdbook, the themes are isolated. Instead of CSS, KeenWrite
| themes are written in ConTeXt. There are several example starter
| themes.[2] A "thesis" theme would be a nice addition, but there's
| a problem.
|
| Markdown lacks a standard for cross-references and citations. An
| open KeenWrite issue animates a possible UX solution.[3] The
| topic of references/citations has been discussed on CommonMark[4]
| without much movement. Parsing cross-references and citations
| would likely benefit all flexmark-java[5] integrations. KeenWrite
| uses flexmark-java, but I'm otherwise unaffiliated. If anyone is
| interested in helping, reach out (see profile).
|
| [0]: https://keenwrite.com/
|
| [1]:
| https://gitlab.com/DaveJarvis/KeenWrite/-/blob/main/docs/cmd...
|
| [2]: https://gitlab.com/DaveJarvis/keenwrite-themes/
|
| [3]: https://gitlab.com/DaveJarvis/KeenWrite/-/issues/145
|
| [4]: https://talk.commonmark.org/t/cross-references-and-
| citations...
|
| [5]: https://github.com/vsch/flexmark-java
___________________________________________________________________
(page generated 2023-11-11 23:00 UTC)