[HN Gopher] QuestPDF: Modern .NET library for PDF document gener...
___________________________________________________________________
QuestPDF: Modern .NET library for PDF document generation
Author : nateb2022
Score : 112 points
Date : 2023-01-18 17:27 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| space_ghost wrote:
| Oh man, .NET needed this. Did some PDF work for a .NET project
| last year and found the ecosystem to be somewhat light on PDF
| support. There's a few commercial options, but they're pricey.
| ejb999 wrote:
| I've used this one with great success (and its free):
| https://github.com/tuespetre/TuesPechkin
|
| its basically a wrapper for wkhtmltopdf but I develop an app
| that has probably generated a million +/- invoices/statements
| over the past 5 years with it, and its been rock solid for me.
| Was a bit of a bear to get it working the first time (not a ton
| of documentation that I could find at the time), but once
| working, was easy to add/change new documents/layouts.
|
| As it uses wkhtmltopdf under to covers, it is a HTML->PDF tool,
| but I prefer that, at least for my use case.
|
| Not sure there is a dotnet-core version, so that might be a
| problem for some.
| hbcondo714 wrote:
| I've been using Rotativa[1] for URL to PDF generation which
| is also a wrapper for wkhtmltopdf. They have a dotnet-core[2]
| version and also a SaaS[3] but it's worth mentioning that
| Azure PaaS supports wkhtmltopdf[4] so I just self-host.
|
| Looking at QuestPDF's API docs, it doesn't look like they
| support URL / HTML to PDF generation. I think this would be a
| great addition especially given the age and issues with
| Rotativa and TuesPechkin on their public repos.
|
| [1] https://github.com/webgio/Rotativa
|
| [2] https://github.com/webgio/Rotativa.AspNetCore
|
| [3] https://rotativa.io/
|
| [4] https://github.com/projectkudu/kudu/wiki/Azure-Web-App-
| sandb...
| pathartl wrote:
| We've been on iTextSharp 5 for a decade for this reason.
| msk-lywenn wrote:
| What more this does that pdfsharp doesn't ?
| wackget wrote:
| >> You are 250 lines of C# code away from creating a fully
| functional PDF invoice implementation.
|
| As a web developer this hurt to read. This is a task which is
| just crying out for a markup language and a stylesheet, not
| hundreds of lines of declarative C# code.
|
| Even the "complex example" in their documentation looks like the
| most basic of web pages.
| aidos wrote:
| Not familiar with .Net but I'd imagine this would probably be
| fairly easy to build on top of this library (and I agree, xml
| is often a much better way to generate reports).
|
| I've done something similar but in Python and generating Excel
| documents. I use jinja for templating to create the xml and
| then parse that and convert to commands that drive the library
| that creates the final document.
| naasking wrote:
| Will those basic web pages be less than 250 lines for the
| equivalent look? I'm skeptical.
| jaywalk wrote:
| There are PDF generators that work just like that. As a web
| developer who uses C# on the backend, QuestPDF is exactly what
| I want.
| bob1029 wrote:
| We do a _lot_ of dynamic report gen PDFs and this is something
| we 'd prefer.
|
| Right now, we basically emulate this technique w/ HTML->PDF. We
| build chunks of report HTML with various string interpolation
| methods and then compose those to obtain our final HTML output.
|
| Raw, declarative HTML is nice if you don't have an undefined #
| of things to describe with it. When you are looping and
| projecting domain types into a report, things get a lot
| trickier.
| amithegde wrote:
| I used https://github.com/Antaris/RazorEngine to generate all
| sorts of complex HTML, email body etc. back in the day. Since
| it follows razor syntax, loops etc. work well
| bob1029 wrote:
| We actually used this exact library at one point, but it
| fell out of favor for some reason I cannot recall.
| pathartl wrote:
| Coming from the web dev space into backend on a project that
| heavily relies on PDF generation, I would say that something
| like a PDF often cannot be expressed with just markup and a
| stylesheet. There's a large difference in something like the
| web (it must be expressed with some fluidity of layout)
| compared to a very static document like a PDF. Page breaks,
| readability, print supply, watermarks, paging, etc all has to
| be considered.
| aidos wrote:
| I feel like all of that can be done in markup.
| kgwxd wrote:
| I've worked with PDF markup tools built on libriaries like
| this for 20 years, both third-party and in-house custom. It
| usually takes 10 minutes to find out the markup doesn't
| support what is required for the task. Third-party you have
| to find a hack or drop it all together. In-house you can
| maybe add something in, but you'll have to do it fast, and
| if you can't break it down into a general-purpose feature
| (which you probably can't because the fundamental
| philosophy of your "easy" markup language wasn't designed
| with anything like this in mind) so you'll just have to
| uglify the markup language even more or, again, drop it all
| together.
|
| Code is the only sane way.
| aidos wrote:
| Well sure, as ever, it depends on your usecase.
|
| PDF is an insanely complex spec (I've spent more time
| reading it than most because I need to know bits of it
| for my job and I just generally find it fascinating). But
| a lot of devs just need to put some content on the screen
| to match a template they were given. In my experience, a
| complete enough markup language allows you to bang out
| and maintain those templates better than code.
|
| I know it doesn't suit every need, but it's just a way of
| representing the data so it's closer to the final output
| than imperative code is. Definitely take your point
| though about the limitations becoming dealbreakers.
| layer8 wrote:
| It can, but it'll become something complex and Turing-
| complete like LaTeX.
| password4321 wrote:
| https://docs.aspose.com/pdf/net/working-with-xml/
|
| Starting at $3600 for use on a web server.
| wvenable wrote:
| But then you need a template language to generate the markup
| from the data.
| aidos wrote:
| That's a well trodden path in most languages. A cursory
| search surfaced this library that looks like it would
| probably do the job:
|
| https://github.com/scriban/scriban
| wvenable wrote:
| A programming language referencing a template language
| library for processing a markup language to generate
| another markup language (PDF) sounds just about right.
| aidos wrote:
| Nitpick but it's a stretch to classify PDF as a markup
| language. They're a graph of nodes that can encapsulate
| myriad different types of data including things that are
| probably even turing complete like fonts. Even the
| graphics streams inside PDFs aren't markup.
|
| We build abstractions for a reason. I think we can all
| agree that templating markup for layouts has been a
| reasonable success story of the web generation.
| styx31 wrote:
| Webpages and pdf (paged documents) are fundamentally different,
| you won't be able to support easily headers and footers, page-
| breaks and orphans on a webpage. You can create basic invoices
| on webpages, but anything more complex (and by that I mean any
| serious word document) will require you to twist HTML. Try to
| have column headers to repeat on each printed page on a HTML
| page.
| aidos wrote:
| The markup doesn't need to be html - and would be better not
| to be. The point is more that templating languages are great
| for formatting data as markup and markup is great for driving
| layout. With this library as a backend you can make something
| super usable.
| lazyeye wrote:
| None of these things are difficult at all with html. Plus you
| have the benefit of having the document viewable in a web
| browser too. You use the exact same html layout for both with
| specific css (heights, widths mainly) for each.
| SigmundA wrote:
| I believe browsers have been repeating table headers on
| printed output for some time.
|
| Page media CSS is designed for this although most browsers
| don't fully support it, PrinceXML is the go to for full paged
| media support.
|
| IMO they are not fundamentally different, they are both
| document formats, PDF just a has fixed paged rendering layout
| baked in while HTML can flow and adjust to rendering target.
| The main issue is lack of full print CSS support in HTML
| rendering engines.
|
| https://www.w3.org/TR/css-page-3/
|
| https://www.princexml.com
| styx31 wrote:
| You are right about the thead repeatable header.
|
| Still, to switch back to the previous point, it seems it's
| more a divergence between using markup or code to design a
| document. Both have valid usage and benefits depending on
| your case.
|
| In my case and my apps, I often need to handle complex
| conditions that fits better imo in procedural code (complex
| invoices and agreements). On other cases (reports), I
| prefer to use a markup language.
| SigmundA wrote:
| There are a lot of procedural tools for generating HTML,
| lots, if modern browsers fully supported print CSS then
| you could use them for complex PDF generation, or direct
| printing, either client side or on the server headless.
|
| If your app is a web app this is a no brainer, the users
| browser could simply do the print or PDF conversion as
| needed.
|
| I do see a use for more direct libraries in native apps,
| although if every native client had a browser control
| with full print CSS support even then it might not be
| such an issue.
| Scarbutt wrote:
| _If your app is a web app this is a no brainer, the users
| browser could simply do the print or PDF conversion as
| needed._
|
| That's arguable, IME (and also a better UX), most would
| prefer to just get the PDF file which just one click than
| to deal with additional browser dialogs. No everyone
| knows how to do print-to-pdf or even know it exists.
|
| Or do you mean browsers expose print-to-pdf functionality
| as an API?
| SigmundA wrote:
| Hitting print in the browser or calling Window.print() if
| you want to force the dialog.
|
| If you serve a PDF you still need to hit print or use
| dialog to save, you can use a headless browser server
| side to serve that if needed.
|
| I do think browser could use better print API's but you
| not getting around that with server side PDF's unless the
| server direct prints to on site printers or something.
| petilon wrote:
| The hard part of PDF generation is support for complex script
| (Arabic, Indian languages etc.), including embedding a font
| subset. On Windows this is usually accomplished using the
| Uniscribe library (which is not available on Linux). QuestPDF
| appears to be using HarfBuzz for this purpose. If that works well
| then this is a winner!
| xoac wrote:
| HarfBuzz is the gold standard.
| ripberge wrote:
| Does this allow you to use existing PDFs as "templates"? We do
| that a lot with PDFs. It allows end users to design in Adobe
| Acrobat and upload to our product. We can then inject dynamic
| data into placeholders at runtime. We do this for text and
| images.
| phonon wrote:
| Do you mean Acroforms?
| nateb2022 wrote:
| Not currently; however, there's an open issue regarding this
| very topic: https://github.com/QuestPDF/QuestPDF/issues/283
| phpdave11 wrote:
| It shouldn't be too difficult to add support for this. I
| authored a Go library which adds support for importing PDFs
| into a new PDF generator (either gofpdf or gopdf). It is
| around 2,500 lines of code:
| https://github.com/phpdave11/gofpdi
| renaudl_ wrote:
| Just why aren't you using an api like doppio.sh ?
| mwcampbell wrote:
| I don't see any support for tagged PDF output. That's important
| for accessibility, particularly for screen reader users.
| nateb2022 wrote:
| Good point! There's an open issue regarding that, and it seems
| to be due to the fact that under the hood, QuestPDF uses Skia
| which itself lacks support for tagged PDF's:
| https://github.com/QuestPDF/QuestPDF/issues/193
| qwertox wrote:
| This could be a SkiaSharp limitation. This thread made me
| interested in Skia and I started looking around their site
| and did a quick search for "tagged PDF" on their Milestone
| Release Notes.
|
| If they understand the same thing with tagged PDF as what is
| being discussed in this thread, that page says that "Add new
| APIs to add attributes to document structure node when
| creating a tagged PDF.", which could be a milestone as old as
| of 2020 [0]
|
| [0] https://skia.org/docs/user/release/release_notes/#milesto
| ne-...
| edragoev wrote:
| The .NET version of PDFjet supports tagged PDFs:
|
| https://github.com/edragoev1/pdfjet
|
| so do the Java, Swift and Go versions.
|
| While the docs are somewhat sparse, Example_45 shows how to
| create PDF/UA compliant PDF.
| yupis wrote:
| Smart people of HN why can't I directly edit a PDF text file and
| change some letters?
| KMag wrote:
| A PDF file is a program for a virtual machine that draws
| characters. For instance, I believe fonts in PDF work like
| PostScript fonts, where (for left-to-right languages) each
| glyph in the font is actually a bytecode function that starts
| with the brush in the lower-left corner of where the glyph is
| to be drawn, draws the glyph, and leaves the brush at the
| lower-left corner of where the next glyph is to be drawn. I
| think it's somewhat similar to turtle graphics, if you're
| familiar with Logo programming or G-code if you've ever hand-
| coded a CNC mill. (PostScript is text instead of bytecode. PDF
| is an odd mix of a binary and text format, which helps explain
| why it has had so many parsing security vulnerabilities over
| the years.)
|
| For common cases, it may be possible to basically decompile the
| PDF, modify the text, and re-flow the text, and re-compile to
| bytecode. However, it's very complicated to do in the general
| case. (Note that in HTML, the browser determines how to best
| layout the text, but with PDF, the PDF generator makes the
| layout decisions.)
|
| Also, many PDF renderers will "compress" fonts by lazily
| building up an embedded font as glyphs are used in the
| document. These typically will assign "a" to the first glyph
| used "b" to the second, etc., so if you decompile "This is some
| text", you'll see "abcd cd defg hgih". Some PDF generators will
| helpfully annotate the generated text with "backing text"
| metadata to help screen readers/copying-to-clipboard, but it's
| far from universal. So, you might need a database of hashes of
| all of the bytecode functions in a large number of fonts and/or
| some image-to-text software in order to reliably decompile the
| PDF.
|
| If you're unable to copy text out of a PDF or you get gibberish
| when you copy text from the PDF, it's likely because the PDF
| lacks this "backing text" metadata (and in the gibberish case,
| likely a compressed embedded font). Some scanners will
| helpfully perform OCR to add this backing text metadata to the
| generated PDF.
|
| Source: I did a small amount of work related to PDF analysis in
| Google's web search indexing pipeline over a decade ago. Most
| of my work was related to figuring out how JavaScript altered
| web page text, but I did learn just enough about PDF to be
| dangerous. At the time, Yahoo was Google's biggest competitor,
| and tons of their indexed PDFs had preview text that was this
| compressed font "abcd cd de..." garbage. Yahoo obviously
| naively decompiled the PDF and just trusted that "a" in the
| embedded font was a bytecode function that drew the glyph "a".
| bazoom42 wrote:
| You can, using a tool like Adobe Acrobat. But a PDF is a fixed
| layout, where each line of text is a positioned box. So editing
| text will not cause reflow across lines.
| Scarbutt wrote:
| Not knowing anything about PDF generation (but will need to
| soon), what can these libraries do that you can't do with
| something like a puppeteer web service and create PDFs with
| HTML/CSS?
| bigtex wrote:
| Tables over multiple pages is a major problem. It just doesn't
| work with the popular htmlpdf tool that everyone uses to power
| their tools. That is the use case I am interested in.
| ficklepickle wrote:
| I put puppeteer into a serverless function and it worked well
| enough for low tens of thousands of PDFs a day. It's not fast,
| nor efficient, but it was reliable and surprisingly cheap. It
| was a definite improvement over the existing solution which was
| a terrible proprietary black box that was occasionally
| returning the wrong invoice, but that is not saying much. It
| was an easy drop-in replacement because we were already
| generating invoices in HTML, so we just sent them to the new
| PDF service instead.
|
| Something like this is likely much more efficient than
| launching a whole browser for each PDF.
| pathartl wrote:
| Using HTML/CSS for PDFs really just isn't a good idea in my
| experience. It makes layout extremely cumbersome. If you just
| need to spit some data out onto a page, sure it works I guess.
| However, doing more complex page layout with an actual design
| element often introduces scenarios where a markup language just
| can't work.
| px1999 wrote:
| Scale/performance. The interface is also straightforward to
| use. Puppeteer or any nonembedded process is just unnecessary
| hassle/overhead in a lot of cases.
| bayesian_horse wrote:
| It's an overhead but not a big one, at least for web
| applications, especially if they run as containers anyway.
| And then it really scales like crazy. Yes, this pdf generator
| may be faster at what it does, but a headless browser with
| paged media polyfill can do a lot more than this and uses
| html+css which are widely used standards.
| px1999 wrote:
| Sure, but as others have said, how do you get column
| headers appearing on each page, put metadata into your
| documents, make elements properly selectable etc etc
|
| "Just run it as a container" is a bit of an industry cop-
| out for making stuff unnecessarily complex.
| mattferderer wrote:
| This looks awesome! The .NET foundation I hope supports this ASAP
| - https://old.dotnetfoundation.org/projects/
|
| I don't know if it's due to "partnerships" but I never could
| understand why Microsoft didn't do better at supporting .NET Word
| & PDF tooling since .NET Core came out. The older versions I know
| at least had support for Word docs. Creating documents is a huge
| foundation of their company.
| password4321 wrote:
| Careful consideration should be taken before joining the .NET
| Foundation.
|
| _How the .NET Foundation kerfuffle became a brouhaha_
|
| 2021-10-08
| https://news.ycombinator.com/item?id=28794352#28795511
|
| > _the project had now been silently moved to GitHub Enterprise
| (likely in the short window @dnfadmin had owner access). The
| author states that projects in GitHub Enterprise can be
| entirely controlled by the owner of the account (the .NET
| Foundation). This transfer happened silently._
| matchagaucho wrote:
| The Adobe vs Microsoft competitive relationship is indeed a
| puzzle.
|
| Some features, like eSignature, there's more co-opetition and
| partnering.
| dustymcp wrote:
| I think alot of people have moved on and are using services or
| puppeteer to generate their pdf's i know we did since we
| couldnt find a library that worked properly for our usecase.
| paranoidrobot wrote:
| the tl;dr of using Puppeteer for this is "We run Chrome in a
| headless mode, load your page, and then print to PDF with
| it".
|
| It makes me nervous having Chrome running on the server, even
| inside a container without root. Doubly so if the user is
| able to control any portion of the page being run by Chrome.
| lazyeye wrote:
| Why you would use anything other than a HTML -> PDF rendering
| engine is beyond me.
| magnat wrote:
| For a proper page numbering, consistent word wrapping, pixel-
| perfect font rendering and document outline support.
| lazyeye wrote:
| You cant do page numbering in html? Seriously? I haven't had
| an issue with any of these things rendering html to pdf.
| kodt wrote:
| Filling a fillable PDF programmatically?
| password4321 wrote:
| Is there an open source pure-.NET library that implements this?
| mattferderer wrote:
| Because styling semi-complicated PDFs with CSS is a layer of
| hell right above e-mails & old browsers. I say this as someone
| who enjoys CSS & in the .NET world has used this method over
| things like Crystal Reports (even before they dropped their
| .NET support).
| lazyeye wrote:
| I havent found this at all. Ive been rendering very complex
| html to pdf (complex svg charts, headers, footers etc) and
| its been fine. Just a matter of getting the element
| heights/widths correct. Once you've got the basic page
| template done its not much effort at all to tweak as required
| renaudl_ wrote:
| Have you heard of Paged.js ?
| trynewideas wrote:
| There's a good matrix of feature support at https://print-
| css.rocks/lessons for all the things HTML-to-PDF engines can
| (and can't) do.
|
| The CSS3 Paged Media spec was born broken on some fundamental
| things like counter resets, then effectively abandoned in 2013,
| so some complex print-specific requirements like fully
| customizable page numbering just don't happen without
| additional tooling. Accessible tagged PDFs are still a
| struggle, and I think only Weasyprint readily supports them
| among free or open-source options (and only since around
| September).
| hacknewslogin wrote:
| Very cool, I'm looking to learn HTML, CSS, C and someday forth.
| Does anyone know if there's anything like this for those
| languages?
| yodon wrote:
| See Poe's Law [0]
|
| [0]https://en.wikipedia.org/wiki/Poe%27s_law
| cinntaile wrote:
| I think you're in the wrong thread?
| hacknewslogin wrote:
| This is for making PDFs using C# code, right? And you can
| preview it while you work? I was wondering if that is
| available for other programming languages.
| xupybd wrote:
| https://en.m.wikipedia.org/wiki/JasperReports
|
| https://en.m.wikipedia.org/wiki/Crystal_Reports
|
| https://pypi.org/project/pdf-reports/
| zzo38computer wrote:
| Since you mention Forth, I might mention that PostScript is
| another stack-based programming language (different than Forth
| although there are some similarities), which can be used to
| make PDF output. Additional PostScript codes could be made
| which you can load into your file in order to add additional
| procedures, etc for doing formatting that you will not need to
| write by yourself.
| imafish wrote:
| Is you Bot?
| hacknewslogin wrote:
| No, this is patrick!
| KRAKRISMOTT wrote:
| > _and someday forth_
|
| PostScript is eternal
| px1999 wrote:
| Oh, I get it! you're saying that .net isn't a good language.
| haha great one.
| xupybd wrote:
| I still find jasper reports the best in pdf generation. Jasper
| studio gives you okay design tools. Much better than hand coding.
| Jasper server means integration is as simple as a rest interface.
| The community edition seems to do everything I need.
| gibsonf1 wrote:
| We've been really happy with https://pdfbox.apache.org/ in
| production, although not .net of course.
| Scarbutt wrote:
| you have to add table support yourself.
| mrwizrd wrote:
| This looks great. I am glad to still see good work being done in
| this space.
|
| I had used https://gotenberg.dev/ on AWS in the past. Many of the
| options available at the time weren't usable in Azure outside of
| a VM due to needing to make use of GDI interfaces that were
| disabled for security reasons. Interested to see how it compares
| to that and the other options being floated at the time like
| Puppeteer*
| jiggawatts wrote:
| Containers can use the full Windows Server base image, which
| includes the GDI+ libraries.
|
| I used this as a trick for making Crystal Reports work.
___________________________________________________________________
(page generated 2023-01-18 23:00 UTC)