hngopher.com

       [HN Gopher] QuestPDF: Modern .NET library for PDF document gener...
       ___________________________________________________________________
        
       QuestPDF: Modern .NET library for PDF document generation
        
       Author : nateb2022
       Score  : 112 points
       Date   : 2023-01-18 17:27 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | space_ghost wrote:
       | Oh man, .NET needed this. Did some PDF work for a .NET project
       | last year and found the ecosystem to be somewhat light on PDF
       | support. There's a few commercial options, but they're pricey.
        
         | ejb999 wrote:
         | I've used this one with great success (and its free):
         | https://github.com/tuespetre/TuesPechkin
         | 
         | its basically a wrapper for wkhtmltopdf but I develop an app
         | that has probably generated a million +/- invoices/statements
         | over the past 5 years with it, and its been rock solid for me.
         | Was a bit of a bear to get it working the first time (not a ton
         | of documentation that I could find at the time), but once
         | working, was easy to add/change new documents/layouts.
         | 
         | As it uses wkhtmltopdf under to covers, it is a HTML->PDF tool,
         | but I prefer that, at least for my use case.
         | 
         | Not sure there is a dotnet-core version, so that might be a
         | problem for some.
        
           | hbcondo714 wrote:
           | I've been using Rotativa[1] for URL to PDF generation which
           | is also a wrapper for wkhtmltopdf. They have a dotnet-core[2]
           | version and also a SaaS[3] but it's worth mentioning that
           | Azure PaaS supports wkhtmltopdf[4] so I just self-host.
           | 
           | Looking at QuestPDF's API docs, it doesn't look like they
           | support URL / HTML to PDF generation. I think this would be a
           | great addition especially given the age and issues with
           | Rotativa and TuesPechkin on their public repos.
           | 
           | [1] https://github.com/webgio/Rotativa
           | 
           | [2] https://github.com/webgio/Rotativa.AspNetCore
           | 
           | [3] https://rotativa.io/
           | 
           | [4] https://github.com/projectkudu/kudu/wiki/Azure-Web-App-
           | sandb...
        
         | pathartl wrote:
         | We've been on iTextSharp 5 for a decade for this reason.
        
       | msk-lywenn wrote:
       | What more this does that pdfsharp doesn't ?
        
       | wackget wrote:
       | >> You are 250 lines of C# code away from creating a fully
       | functional PDF invoice implementation.
       | 
       | As a web developer this hurt to read. This is a task which is
       | just crying out for a markup language and a stylesheet, not
       | hundreds of lines of declarative C# code.
       | 
       | Even the "complex example" in their documentation looks like the
       | most basic of web pages.
        
         | aidos wrote:
         | Not familiar with .Net but I'd imagine this would probably be
         | fairly easy to build on top of this library (and I agree, xml
         | is often a much better way to generate reports).
         | 
         | I've done something similar but in Python and generating Excel
         | documents. I use jinja for templating to create the xml and
         | then parse that and convert to commands that drive the library
         | that creates the final document.
        
         | naasking wrote:
         | Will those basic web pages be less than 250 lines for the
         | equivalent look? I'm skeptical.
        
         | jaywalk wrote:
         | There are PDF generators that work just like that. As a web
         | developer who uses C# on the backend, QuestPDF is exactly what
         | I want.
        
         | bob1029 wrote:
         | We do a _lot_ of dynamic report gen PDFs and this is something
         | we 'd prefer.
         | 
         | Right now, we basically emulate this technique w/ HTML->PDF. We
         | build chunks of report HTML with various string interpolation
         | methods and then compose those to obtain our final HTML output.
         | 
         | Raw, declarative HTML is nice if you don't have an undefined #
         | of things to describe with it. When you are looping and
         | projecting domain types into a report, things get a lot
         | trickier.
        
           | amithegde wrote:
           | I used https://github.com/Antaris/RazorEngine to generate all
           | sorts of complex HTML, email body etc. back in the day. Since
           | it follows razor syntax, loops etc. work well
        
             | bob1029 wrote:
             | We actually used this exact library at one point, but it
             | fell out of favor for some reason I cannot recall.
        
         | pathartl wrote:
         | Coming from the web dev space into backend on a project that
         | heavily relies on PDF generation, I would say that something
         | like a PDF often cannot be expressed with just markup and a
         | stylesheet. There's a large difference in something like the
         | web (it must be expressed with some fluidity of layout)
         | compared to a very static document like a PDF. Page breaks,
         | readability, print supply, watermarks, paging, etc all has to
         | be considered.
        
           | aidos wrote:
           | I feel like all of that can be done in markup.
        
             | kgwxd wrote:
             | I've worked with PDF markup tools built on libriaries like
             | this for 20 years, both third-party and in-house custom. It
             | usually takes 10 minutes to find out the markup doesn't
             | support what is required for the task. Third-party you have
             | to find a hack or drop it all together. In-house you can
             | maybe add something in, but you'll have to do it fast, and
             | if you can't break it down into a general-purpose feature
             | (which you probably can't because the fundamental
             | philosophy of your "easy" markup language wasn't designed
             | with anything like this in mind) so you'll just have to
             | uglify the markup language even more or, again, drop it all
             | together.
             | 
             | Code is the only sane way.
        
               | aidos wrote:
               | Well sure, as ever, it depends on your usecase.
               | 
               | PDF is an insanely complex spec (I've spent more time
               | reading it than most because I need to know bits of it
               | for my job and I just generally find it fascinating). But
               | a lot of devs just need to put some content on the screen
               | to match a template they were given. In my experience, a
               | complete enough markup language allows you to bang out
               | and maintain those templates better than code.
               | 
               | I know it doesn't suit every need, but it's just a way of
               | representing the data so it's closer to the final output
               | than imperative code is. Definitely take your point
               | though about the limitations becoming dealbreakers.
        
             | layer8 wrote:
             | It can, but it'll become something complex and Turing-
             | complete like LaTeX.
        
         | password4321 wrote:
         | https://docs.aspose.com/pdf/net/working-with-xml/
         | 
         | Starting at $3600 for use on a web server.
        
         | wvenable wrote:
         | But then you need a template language to generate the markup
         | from the data.
        
           | aidos wrote:
           | That's a well trodden path in most languages. A cursory
           | search surfaced this library that looks like it would
           | probably do the job:
           | 
           | https://github.com/scriban/scriban
        
             | wvenable wrote:
             | A programming language referencing a template language
             | library for processing a markup language to generate
             | another markup language (PDF) sounds just about right.
        
               | aidos wrote:
               | Nitpick but it's a stretch to classify PDF as a markup
               | language. They're a graph of nodes that can encapsulate
               | myriad different types of data including things that are
               | probably even turing complete like fonts. Even the
               | graphics streams inside PDFs aren't markup.
               | 
               | We build abstractions for a reason. I think we can all
               | agree that templating markup for layouts has been a
               | reasonable success story of the web generation.
        
         | styx31 wrote:
         | Webpages and pdf (paged documents) are fundamentally different,
         | you won't be able to support easily headers and footers, page-
         | breaks and orphans on a webpage. You can create basic invoices
         | on webpages, but anything more complex (and by that I mean any
         | serious word document) will require you to twist HTML. Try to
         | have column headers to repeat on each printed page on a HTML
         | page.
        
           | aidos wrote:
           | The markup doesn't need to be html - and would be better not
           | to be. The point is more that templating languages are great
           | for formatting data as markup and markup is great for driving
           | layout. With this library as a backend you can make something
           | super usable.
        
           | lazyeye wrote:
           | None of these things are difficult at all with html. Plus you
           | have the benefit of having the document viewable in a web
           | browser too. You use the exact same html layout for both with
           | specific css (heights, widths mainly) for each.
        
           | SigmundA wrote:
           | I believe browsers have been repeating table headers on
           | printed output for some time.
           | 
           | Page media CSS is designed for this although most browsers
           | don't fully support it, PrinceXML is the go to for full paged
           | media support.
           | 
           | IMO they are not fundamentally different, they are both
           | document formats, PDF just a has fixed paged rendering layout
           | baked in while HTML can flow and adjust to rendering target.
           | The main issue is lack of full print CSS support in HTML
           | rendering engines.
           | 
           | https://www.w3.org/TR/css-page-3/
           | 
           | https://www.princexml.com
        
             | styx31 wrote:
             | You are right about the thead repeatable header.
             | 
             | Still, to switch back to the previous point, it seems it's
             | more a divergence between using markup or code to design a
             | document. Both have valid usage and benefits depending on
             | your case.
             | 
             | In my case and my apps, I often need to handle complex
             | conditions that fits better imo in procedural code (complex
             | invoices and agreements). On other cases (reports), I
             | prefer to use a markup language.
        
               | SigmundA wrote:
               | There are a lot of procedural tools for generating HTML,
               | lots, if modern browsers fully supported print CSS then
               | you could use them for complex PDF generation, or direct
               | printing, either client side or on the server headless.
               | 
               | If your app is a web app this is a no brainer, the users
               | browser could simply do the print or PDF conversion as
               | needed.
               | 
               | I do see a use for more direct libraries in native apps,
               | although if every native client had a browser control
               | with full print CSS support even then it might not be
               | such an issue.
        
               | Scarbutt wrote:
               | _If your app is a web app this is a no brainer, the users
               | browser could simply do the print or PDF conversion as
               | needed._
               | 
               | That's arguable, IME (and also a better UX), most would
               | prefer to just get the PDF file which just one click than
               | to deal with additional browser dialogs. No everyone
               | knows how to do print-to-pdf or even know it exists.
               | 
               | Or do you mean browsers expose print-to-pdf functionality
               | as an API?
        
               | SigmundA wrote:
               | Hitting print in the browser or calling Window.print() if
               | you want to force the dialog.
               | 
               | If you serve a PDF you still need to hit print or use
               | dialog to save, you can use a headless browser server
               | side to serve that if needed.
               | 
               | I do think browser could use better print API's but you
               | not getting around that with server side PDF's unless the
               | server direct prints to on site printers or something.
        
       | petilon wrote:
       | The hard part of PDF generation is support for complex script
       | (Arabic, Indian languages etc.), including embedding a font
       | subset. On Windows this is usually accomplished using the
       | Uniscribe library (which is not available on Linux). QuestPDF
       | appears to be using HarfBuzz for this purpose. If that works well
       | then this is a winner!
        
         | xoac wrote:
         | HarfBuzz is the gold standard.
        
       | ripberge wrote:
       | Does this allow you to use existing PDFs as "templates"? We do
       | that a lot with PDFs. It allows end users to design in Adobe
       | Acrobat and upload to our product. We can then inject dynamic
       | data into placeholders at runtime. We do this for text and
       | images.
        
         | phonon wrote:
         | Do you mean Acroforms?
        
         | nateb2022 wrote:
         | Not currently; however, there's an open issue regarding this
         | very topic: https://github.com/QuestPDF/QuestPDF/issues/283
        
           | phpdave11 wrote:
           | It shouldn't be too difficult to add support for this. I
           | authored a Go library which adds support for importing PDFs
           | into a new PDF generator (either gofpdf or gopdf). It is
           | around 2,500 lines of code:
           | https://github.com/phpdave11/gofpdi
        
       | renaudl_ wrote:
       | Just why aren't you using an api like doppio.sh ?
        
       | mwcampbell wrote:
       | I don't see any support for tagged PDF output. That's important
       | for accessibility, particularly for screen reader users.
        
         | nateb2022 wrote:
         | Good point! There's an open issue regarding that, and it seems
         | to be due to the fact that under the hood, QuestPDF uses Skia
         | which itself lacks support for tagged PDF's:
         | https://github.com/QuestPDF/QuestPDF/issues/193
        
           | qwertox wrote:
           | This could be a SkiaSharp limitation. This thread made me
           | interested in Skia and I started looking around their site
           | and did a quick search for "tagged PDF" on their Milestone
           | Release Notes.
           | 
           | If they understand the same thing with tagged PDF as what is
           | being discussed in this thread, that page says that "Add new
           | APIs to add attributes to document structure node when
           | creating a tagged PDF.", which could be a milestone as old as
           | of 2020 [0]
           | 
           | [0] https://skia.org/docs/user/release/release_notes/#milesto
           | ne-...
        
         | edragoev wrote:
         | The .NET version of PDFjet supports tagged PDFs:
         | 
         | https://github.com/edragoev1/pdfjet
         | 
         | so do the Java, Swift and Go versions.
         | 
         | While the docs are somewhat sparse, Example_45 shows how to
         | create PDF/UA compliant PDF.
        
       | yupis wrote:
       | Smart people of HN why can't I directly edit a PDF text file and
       | change some letters?
        
         | KMag wrote:
         | A PDF file is a program for a virtual machine that draws
         | characters. For instance, I believe fonts in PDF work like
         | PostScript fonts, where (for left-to-right languages) each
         | glyph in the font is actually a bytecode function that starts
         | with the brush in the lower-left corner of where the glyph is
         | to be drawn, draws the glyph, and leaves the brush at the
         | lower-left corner of where the next glyph is to be drawn. I
         | think it's somewhat similar to turtle graphics, if you're
         | familiar with Logo programming or G-code if you've ever hand-
         | coded a CNC mill. (PostScript is text instead of bytecode. PDF
         | is an odd mix of a binary and text format, which helps explain
         | why it has had so many parsing security vulnerabilities over
         | the years.)
         | 
         | For common cases, it may be possible to basically decompile the
         | PDF, modify the text, and re-flow the text, and re-compile to
         | bytecode. However, it's very complicated to do in the general
         | case. (Note that in HTML, the browser determines how to best
         | layout the text, but with PDF, the PDF generator makes the
         | layout decisions.)
         | 
         | Also, many PDF renderers will "compress" fonts by lazily
         | building up an embedded font as glyphs are used in the
         | document. These typically will assign "a" to the first glyph
         | used "b" to the second, etc., so if you decompile "This is some
         | text", you'll see "abcd cd defg hgih". Some PDF generators will
         | helpfully annotate the generated text with "backing text"
         | metadata to help screen readers/copying-to-clipboard, but it's
         | far from universal. So, you might need a database of hashes of
         | all of the bytecode functions in a large number of fonts and/or
         | some image-to-text software in order to reliably decompile the
         | PDF.
         | 
         | If you're unable to copy text out of a PDF or you get gibberish
         | when you copy text from the PDF, it's likely because the PDF
         | lacks this "backing text" metadata (and in the gibberish case,
         | likely a compressed embedded font). Some scanners will
         | helpfully perform OCR to add this backing text metadata to the
         | generated PDF.
         | 
         | Source: I did a small amount of work related to PDF analysis in
         | Google's web search indexing pipeline over a decade ago. Most
         | of my work was related to figuring out how JavaScript altered
         | web page text, but I did learn just enough about PDF to be
         | dangerous. At the time, Yahoo was Google's biggest competitor,
         | and tons of their indexed PDFs had preview text that was this
         | compressed font "abcd cd de..." garbage. Yahoo obviously
         | naively decompiled the PDF and just trusted that "a" in the
         | embedded font was a bytecode function that drew the glyph "a".
        
         | bazoom42 wrote:
         | You can, using a tool like Adobe Acrobat. But a PDF is a fixed
         | layout, where each line of text is a positioned box. So editing
         | text will not cause reflow across lines.
        
       | Scarbutt wrote:
       | Not knowing anything about PDF generation (but will need to
       | soon), what can these libraries do that you can't do with
       | something like a puppeteer web service and create PDFs with
       | HTML/CSS?
        
         | bigtex wrote:
         | Tables over multiple pages is a major problem. It just doesn't
         | work with the popular htmlpdf tool that everyone uses to power
         | their tools. That is the use case I am interested in.
        
         | ficklepickle wrote:
         | I put puppeteer into a serverless function and it worked well
         | enough for low tens of thousands of PDFs a day. It's not fast,
         | nor efficient, but it was reliable and surprisingly cheap. It
         | was a definite improvement over the existing solution which was
         | a terrible proprietary black box that was occasionally
         | returning the wrong invoice, but that is not saying much. It
         | was an easy drop-in replacement because we were already
         | generating invoices in HTML, so we just sent them to the new
         | PDF service instead.
         | 
         | Something like this is likely much more efficient than
         | launching a whole browser for each PDF.
        
         | pathartl wrote:
         | Using HTML/CSS for PDFs really just isn't a good idea in my
         | experience. It makes layout extremely cumbersome. If you just
         | need to spit some data out onto a page, sure it works I guess.
         | However, doing more complex page layout with an actual design
         | element often introduces scenarios where a markup language just
         | can't work.
        
         | px1999 wrote:
         | Scale/performance. The interface is also straightforward to
         | use. Puppeteer or any nonembedded process is just unnecessary
         | hassle/overhead in a lot of cases.
        
           | bayesian_horse wrote:
           | It's an overhead but not a big one, at least for web
           | applications, especially if they run as containers anyway.
           | And then it really scales like crazy. Yes, this pdf generator
           | may be faster at what it does, but a headless browser with
           | paged media polyfill can do a lot more than this and uses
           | html+css which are widely used standards.
        
             | px1999 wrote:
             | Sure, but as others have said, how do you get column
             | headers appearing on each page, put metadata into your
             | documents, make elements properly selectable etc etc
             | 
             | "Just run it as a container" is a bit of an industry cop-
             | out for making stuff unnecessarily complex.
        
       | mattferderer wrote:
       | This looks awesome! The .NET foundation I hope supports this ASAP
       | - https://old.dotnetfoundation.org/projects/
       | 
       | I don't know if it's due to "partnerships" but I never could
       | understand why Microsoft didn't do better at supporting .NET Word
       | & PDF tooling since .NET Core came out. The older versions I know
       | at least had support for Word docs. Creating documents is a huge
       | foundation of their company.
        
         | password4321 wrote:
         | Careful consideration should be taken before joining the .NET
         | Foundation.
         | 
         |  _How the .NET Foundation kerfuffle became a brouhaha_
         | 
         | 2021-10-08
         | https://news.ycombinator.com/item?id=28794352#28795511
         | 
         | > _the project had now been silently moved to GitHub Enterprise
         | (likely in the short window @dnfadmin had owner access). The
         | author states that projects in GitHub Enterprise can be
         | entirely controlled by the owner of the account (the .NET
         | Foundation). This transfer happened silently._
        
         | matchagaucho wrote:
         | The Adobe vs Microsoft competitive relationship is indeed a
         | puzzle.
         | 
         | Some features, like eSignature, there's more co-opetition and
         | partnering.
        
         | dustymcp wrote:
         | I think alot of people have moved on and are using services or
         | puppeteer to generate their pdf's i know we did since we
         | couldnt find a library that worked properly for our usecase.
        
           | paranoidrobot wrote:
           | the tl;dr of using Puppeteer for this is "We run Chrome in a
           | headless mode, load your page, and then print to PDF with
           | it".
           | 
           | It makes me nervous having Chrome running on the server, even
           | inside a container without root. Doubly so if the user is
           | able to control any portion of the page being run by Chrome.
        
       | lazyeye wrote:
       | Why you would use anything other than a HTML -> PDF rendering
       | engine is beyond me.
        
         | magnat wrote:
         | For a proper page numbering, consistent word wrapping, pixel-
         | perfect font rendering and document outline support.
        
           | lazyeye wrote:
           | You cant do page numbering in html? Seriously? I haven't had
           | an issue with any of these things rendering html to pdf.
        
         | kodt wrote:
         | Filling a fillable PDF programmatically?
        
         | password4321 wrote:
         | Is there an open source pure-.NET library that implements this?
        
         | mattferderer wrote:
         | Because styling semi-complicated PDFs with CSS is a layer of
         | hell right above e-mails & old browsers. I say this as someone
         | who enjoys CSS & in the .NET world has used this method over
         | things like Crystal Reports (even before they dropped their
         | .NET support).
        
           | lazyeye wrote:
           | I havent found this at all. Ive been rendering very complex
           | html to pdf (complex svg charts, headers, footers etc) and
           | its been fine. Just a matter of getting the element
           | heights/widths correct. Once you've got the basic page
           | template done its not much effort at all to tweak as required
        
           | renaudl_ wrote:
           | Have you heard of Paged.js ?
        
         | trynewideas wrote:
         | There's a good matrix of feature support at https://print-
         | css.rocks/lessons for all the things HTML-to-PDF engines can
         | (and can't) do.
         | 
         | The CSS3 Paged Media spec was born broken on some fundamental
         | things like counter resets, then effectively abandoned in 2013,
         | so some complex print-specific requirements like fully
         | customizable page numbering just don't happen without
         | additional tooling. Accessible tagged PDFs are still a
         | struggle, and I think only Weasyprint readily supports them
         | among free or open-source options (and only since around
         | September).
        
       | hacknewslogin wrote:
       | Very cool, I'm looking to learn HTML, CSS, C and someday forth.
       | Does anyone know if there's anything like this for those
       | languages?
        
         | yodon wrote:
         | See Poe's Law [0]
         | 
         | [0]https://en.wikipedia.org/wiki/Poe%27s_law
        
         | cinntaile wrote:
         | I think you're in the wrong thread?
        
           | hacknewslogin wrote:
           | This is for making PDFs using C# code, right? And you can
           | preview it while you work? I was wondering if that is
           | available for other programming languages.
        
             | xupybd wrote:
             | https://en.m.wikipedia.org/wiki/JasperReports
             | 
             | https://en.m.wikipedia.org/wiki/Crystal_Reports
             | 
             | https://pypi.org/project/pdf-reports/
        
         | zzo38computer wrote:
         | Since you mention Forth, I might mention that PostScript is
         | another stack-based programming language (different than Forth
         | although there are some similarities), which can be used to
         | make PDF output. Additional PostScript codes could be made
         | which you can load into your file in order to add additional
         | procedures, etc for doing formatting that you will not need to
         | write by yourself.
        
         | imafish wrote:
         | Is you Bot?
        
           | hacknewslogin wrote:
           | No, this is patrick!
        
         | KRAKRISMOTT wrote:
         | > _and someday forth_
         | 
         | PostScript is eternal
        
         | px1999 wrote:
         | Oh, I get it! you're saying that .net isn't a good language.
         | haha great one.
        
       | xupybd wrote:
       | I still find jasper reports the best in pdf generation. Jasper
       | studio gives you okay design tools. Much better than hand coding.
       | Jasper server means integration is as simple as a rest interface.
       | The community edition seems to do everything I need.
        
       | gibsonf1 wrote:
       | We've been really happy with https://pdfbox.apache.org/ in
       | production, although not .net of course.
        
         | Scarbutt wrote:
         | you have to add table support yourself.
        
       | mrwizrd wrote:
       | This looks great. I am glad to still see good work being done in
       | this space.
       | 
       | I had used https://gotenberg.dev/ on AWS in the past. Many of the
       | options available at the time weren't usable in Azure outside of
       | a VM due to needing to make use of GDI interfaces that were
       | disabled for security reasons. Interested to see how it compares
       | to that and the other options being floated at the time like
       | Puppeteer*
        
         | jiggawatts wrote:
         | Containers can use the full Windows Server base image, which
         | includes the GDI+ libraries.
         | 
         | I used this as a trick for making Crystal Reports work.
        
       ___________________________________________________________________
       (page generated 2023-01-18 23:00 UTC)