[HN Gopher] Launch HN: Onedoc (YC W24) - A better way to create ...
       ___________________________________________________________________
        
       Launch HN: Onedoc (YC W24) - A better way to create PDFs
        
       Hey HN, we're the co-founders of Onedoc
       (https://www.onedoclabs.com/ ), and the original contributors to
       the open-source library react-print-pdf
       (https://github.com/OnedocLabs/react-print-pdf ) which lets
       developers design and generate PDF documents automatically. Here's
       a demo video: https://www.youtube.com/watch?v=MgfCyOyckQU&t=3s
       Billions of PDFs are generated daily: invoices, contracts,
       receipts, reports, you name it. Developer time gets wasted
       producing these basic documents because there are no good-enough
       tools to design and generate PDFs.  We previously worked at giant
       firms, where documents (especially PDFs) were central to most
       workflows. We got asked to generate automated trade confirmations
       for our customer's counterparties. We could not find any tool other
       than outdated libraries offering poor control over layout and the
       generation process. In the end, we just created our own--basically
       bringing web technologies to PDFs. That was the genesis of Onedoc.
       PDF creation has two phases: design (specifying content and layout)
       and generation (producing the actual PDF file). Onedoc lets you do
       both simply and automatically.   _Design_ : we have an open-source
       library called "react-print-pdf"
       (https://github.com/OnedocLabs/react-print-pdf ) that allows you to
       design a document the same way you would design a website. It
       supports Tailwind CSS components, Chakra UI components, and
       recently also built LaTeX and Markdown components. The latter let
       you write text in Markdown style, and include formulas using LaTeX
       syntax, directly within a React component.   _Generation_ : we have
       an API (https://docs.onedoclabs.com/api-reference/introduction )
       and Node.js SDK (https://docs.onedoclabs.com/quickstart/nodejs )
       that render your designs into PDFs.  The choice of renderer
       significantly affects the accuracy of the resulting PDF. For
       example, exporting a webpage into PDF will often result in a layout
       that differs from the original webpage. We ensure that what you
       designed is what you get, and therefore you have 100% control over
       the entire layout of your document including margin, style, etc. We
       can do that because we built the react-print-pdf library to match
       the HTML/CSS to PDF rendering tool we have.  Once you have
       generated your document, you can either store it on your local
       system or, if you want, use our platform
       (https://app.onedoclabs.com/ ) to host your document online. If you
       use us, you'll also get analytics over your documents.  Our main
       product is an API, but you can try it on our website directly
       (https://www.onedoclabs.com/) using our playground without any
       installation or sign-up. Our pricing is usage-based: per document
       generated. The pricing is degressive: the more documents you
       generate, the less you pay per document. If you don't want to pay
       for PDF generation, you can still generate as many documents as you
       want, but with a watermark on the margin.  It's been fun to see
       what our users are building with our open-source library
       (components, templates, etc.) and our API. We have a website
       (https://react-print.onedoclabs.com/) dedicated to the open-source
       library where we post the templates submitted by the community.
       Some early power users built simple web apps (CV/Resume generator,
       NDA and Invoice generator). We are excited to show our product to
       the HN community and look forward to your feedback!
        
       Author : AugusteLef
       Score  : 167 points
       Date   : 2024-03-11 14:52 UTC (8 hours ago)
        
       | Brajeshwar wrote:
       | May be this is just me but this looks extremely costly to me! It
       | will cost $2,500 to generate 50,000 PDFs. Are edits/corrections
       | additional cost?
        
         | Titou325 wrote:
         | This is a good point, and we are still trying to figure out how
         | to price things fairly. Depending on the type of PDF, whether
         | it is a simple receipt or a large multi-pages report,
         | associated costs are very different on our side. At this time,
         | we rely on other proprietary software that we are aiming to
         | replace but that incur high costs on our side as well.
         | 
         | Edits and corrections on generated PDFs is not provided as the
         | PDFs are signed as-is, however you can attach the metadata to
         | the PDF and rerender with the modifications.
        
           | passion__desire wrote:
           | Edits would be limited to certain pages but may spill over
           | (e.g. tables) so the whole PDF need not be generated. Only
           | edited pages can be inserted back to previously generated
           | PDFs. Could be an optimization to reduce cost.
        
           | mediaman wrote:
           | As a point of reference on pricing, convertAPI charges $0.05
           | per document conversion at their most expensive tier, and
           | with any level of fixed commitment ($80 - $300 per month) it
           | goes down to $0.016-0.006 per document.
           | 
           | Their PDF conversion is pretty good (I use it for PPT/Word ->
           | PDF conversion), though your product is obviously different
           | and has different/better capabilities for programmatic PDF
           | creation. Still, a reference point.
           | 
           | Pricing page: https://www.convertapi.com/prices
        
         | snadal wrote:
         | I second this. Maybe I'm missing something in the value
         | proposition, but we already generate PDFs from .docx/.html
         | templates using open source libraries and Docker microservices.
         | 
         | Do not misunderstand. A Stripe for generating PDFs can be
         | great, but for a small team, $0.50/PDF is way more than I can
         | afford (after all, you can create a small number of PDFs
         | without too much fuss). Maybe you are oriented towards large
         | companies?
        
           | AugusteLef wrote:
           | Indeed, and as you mentioned, open-source libraries are
           | always an option. It's worth noting that our open-source
           | library assists in document design, allowing freedom in
           | renderer choice. While the open-source library is aimed at
           | individuals, our API targets businesses of any size. Our
           | pricing can be as low as $0.05 per PDF for high-volume or
           | annual commitments. Additionally, we offer cloud hosting for
           | your documents for up to 90 days, and our pricing includes
           | analytics.
        
         | adnans wrote:
         | We use https://www.api2pdf.com/pricing/ and it's priced per
         | bandwidth and usage - ($.001 per mb bandwidth and $0.00019551
         | per second of computation)
         | 
         | You can choose which API to use: Headless Chrome, Wkhtmltopdf,
         | Libreoffice, etc.
        
         | jot wrote:
         | It sounds like this is as advanced as DocRaptor[1]. They have
         | what I consider to be the best PDF generation API, giving
         | complete control over the documents you need to create. The
         | pricing is similar.
         | 
         | If you'd rather do it for free weasyprint[2] is the best open
         | source alternative.
         | 
         | Another more affordable option you might want to consider is
         | Urlbox[3]. (Disclosure: I work on this)
         | 
         | Urlbox's rendering engine is based on Chrome. It's been refined
         | over the last 11 years to render pages as images or PDFs[4]
         | that look great. I was a customer for 5 years before I joined
         | the team. Everything we'd tried before Urlbox was a
         | disappointment.
         | 
         | Urlbox probably can't match the power of either Onedoc or
         | DocRaptor, but pricing starts at less than $0.01 per document
         | and drops significantly with scale. If your PDF looks great
         | when saving as PDF in Chrome it should look identically
         | brilliant with Urlbox.
         | 
         | [1]: https://docraptor.com [2]: https://weasyprint.org [3]:
         | https://urlbox.com [4]: https://urlbox.com/html-to-pdf
        
       | winter-day wrote:
       | Congrats! My career has also revolved around PDF generation (once
       | for federal compliance at large companies, second for scrubbing
       | data from PDFs for HIPAA compliance and then generating a new pdf
       | based on the scrubbed data). I think I've seen your tool around,
       | I ended up creating a workflow that generated LateX scripts then
       | converted them to pdfs, and the second a python library. The most
       | difficult aspect for our tools was formatting - the pdfs were
       | generally 60-100 pages and tables could show up anywhere and
       | break the page/formatting. Quite curious to see how your company
       | will grow, good luck!
        
         | DutchHugo wrote:
         | Curious, which python library did you use to convert to PDFs?
         | currently looking into a couple options myself
        
           | stormfather wrote:
           | weasyprint isn't terrible
        
       | Gualdrapo wrote:
       | It seems TeX/LaTeX is a major inspiration in this, though there
       | can be seen some room for improvement for details like
       | hyphenation, expansion/protusion and microtypography. Not sure
       | if/how a web engine can reach to those points but still it seems
       | this has a potential niche and market outcome, so congrats.
       | 
       | Though personally I wish stuff like ConTeXt was more popular and
       | approachable - to my humble knowledge their Lua backend seems to
       | have huge potential, I am doing my invoices with ConTeXt/Lua.
        
         | Titou325 wrote:
         | It definitely is! Typesetting quality was the main reason we
         | chose not to go down the Puppeteer/headless browser route but
         | rather use a completely separate engine where typography is a
         | first-class citizen.
         | 
         | We like LaTeX, but even for advanced users laying things out
         | can be a difficult thing. Given that documents are a frontend,
         | we wanted to bring the same tools frontend developers already
         | use.
        
       | kornhucker wrote:
       | Super interesting and potentially a fit for a project I'm working
       | on right now. What are the benefits of going this route vs
       | styling your page for print (ex. tailwind print modifier) and
       | relying on the browser's print dialogue?
        
         | Titou325 wrote:
         | There is both commonalities and differences! Both approaches
         | rely on web technology to provide the layout and are flexible
         | in terms of frameworks and integrations.
         | 
         | Where things differ is that we don't actually use a browser
         | under the hood. This allows a much better control over
         | typesetting and layout - and you can do it on the server. We
         | have also more controls over the outputted PDF and the ability
         | to use more advanced features such as form fields or embedding
         | other files and metadata in the PDF.
        
       | fasteddie31003 wrote:
       | Is this just a wrapper around Puppeteer that renders a pdf? I do
       | this currently with an AWS lambda that has a chrome-aws-lambda
       | layer.
        
         | Titou325 wrote:
         | We use a dedicated HTML to PDF engine (such as PrinceXML)
         | rather than building on top of a browser. Main issue with
         | browser-backed implementations is that PDFs are often of subpar
         | quality. However, the main good thing is you can rely on the
         | latest CSS features.
         | 
         | In the end, what was the main decisive factor is the support
         | for the PrintCSS and PagedMedia specifications, which have been
         | completely discarded by major vendors and only implemented by
         | specific engines.
        
       | Oras wrote:
       | This is definitely a huge market. Are you targeting React
       | developers only? I've successfully used html2pdf in the past, but
       | looking again at their Github, it seems there has been no update
       | in the last three years.
       | 
       | I think SOC2 is a must to start engaging with companies. Most
       | PDFs will have sensitive data, and not many companies will feel
       | comfortable sending customer data to a 3rd party platform, so you
       | need security measures and certifications.
       | 
       | Good luck!
        
         | Titou325 wrote:
         | We actually take HTML as an input to our API converter. The
         | React tooling is mostly to ease the barrier with most frontend
         | codebases, as well as leverage the existing ecosystem of
         | components.
         | 
         | It seems that these conversion engines are massive pieces of
         | work that require a lot of upkeep, partly because CSS is a
         | living spec but also because of the sheer number of edge cases.
         | 
         | We are already working on SOC2 as this has been a recurring
         | ask, and indeed documents almost always contain PII.
        
       | cpr wrote:
       | So are you using PrinceXML for your "completely separate engine
       | where typography is a first-class citizen"?
        
         | Titou325 wrote:
         | Yes, we use an API layer on top of PrinceXML with additional
         | polyfills to support modern features. This is a meh solution
         | but it allowed us to iterate quickly and get to work with
         | customers without building a full blown PDF engine firsthand.
         | However building this engine ourselves is the key to reduced
         | latency and overall better feature support. But we need to
         | engage with our users first and see exactly where we should
         | head first :)
        
           | cpr wrote:
           | Isn't PrinceXML pretty much up to date? What's missing?
        
       | matteason wrote:
       | Really interesting product. I do agree that the pricing seems
       | steep ($0.25/document on Pro on the most generous tier) but I
       | don't know enough about pricing B2B products to know if that
       | would be a blocker.
       | 
       | I agree that HTML -> PDF can be a really powerful tool. I worked
       | on the UK government's tool to generate energy efficiency labels
       | for consumer goods [0] and we ended up doing PDF generation with
       | SVG templates, using Open HTML to PDF for the conversion. That
       | ended up working very well, though as you allude to there can be
       | some gotchas (eg unsupported CSS features) that you need to work
       | around.
       | 
       | A few questions:
       | 
       | - Do the rendered documents support PDF's various accessibility
       | features?
       | 
       | - How suitable is this for print PDF generation? For example,
       | what version of the PDF spec do you target? What's your colour
       | profile support like? Do you support the different PDF page boxes
       | (MediaBox, CropBox, BleedBox, TrimBox, ArtBox)?
       | 
       | [0] https://github.com/UKGovernmentBEIS/energy-label-service
       | 
       | [1] https://github.com/danfickle/openhtmltopdf
        
         | Titou325 wrote:
         | The pricing does go down for larger volumes and is something we
         | still have to narrow down to the exact place that makes sense
         | to companies and is also viable.
         | 
         | - We do not force PDF/* profiles down to the user, but it seems
         | that for most of them PDF/UA-1 would be a sensible default. We
         | can extract most of the tags from the HTML semantics by
         | themselves which makes it much easier.
         | 
         | - We target the PDF 1.7 spec. Color profiles can be changed and
         | you can use a custom .icc profile, with the corresponding
         | embedding restrictions based on the document format. MediaBox
         | is supported through the @page size property. Bleed, trim and
         | marks can be added using vendor specific css properties. We
         | don't support ArtBox yet but this is something we can look
         | into! So far none of our customers really wanted to take this
         | out to a real print shop, but we would be glad to help people
         | go down this route :)
        
           | dmazzoni wrote:
           | So are you saying that you don't output tagged PDFs now?
           | 
           | For those who don't know, if you use Chromium's print-to-pdf
           | feature you get a tagged PDF. And it's scriptable from the
           | command-line too.
        
             | AugusteLef wrote:
             | As mentioned in another comment, "Onedoc generates tagged
             | PDFs as long as you add a `title` property to the API call
             | to make the PDF UA/1 compliant."! Hope it helps
        
       | Leoko wrote:
       | I had to deal a lot with PDF generation over the past few years
       | and I was very unhappy with the eco-system that was available:
       | 
       | 1. HTML-to-PDF: The web has a great layout system that works well
       | for dynamic content. So using that seems like a good idea. BUT it
       | is not very efficient as a lot of these libraries simply spin up
       | a headless browser or deal with virtual doms.
       | 
       | 2. PDF Libraries (like jsPDF): They mostly just have methods like
       | ".text(x, y, string) which is an absolute pain to work with when
       | building dynamic content or creating complex layouts.
       | 
       | This was such a pain point in various projects I worked on that I
       | built my own library that has a component system to build dynamic
       | layouts (like tables over multiple pages) and then computes that
       | down to simple jsPDF commands. Giving you the best of both
       | worlds.
       | 
       | Hope this makes somebody's life a bit easier:
       | https://github.com/DevLeoko/painless-pdf
        
         | Crowberry wrote:
         | I'm with you..
         | 
         | We ended up writing a similar wrapper around
         | https://github.com/jung-kurt/gofpdf library. We haven't open
         | sourced it yet. But it's made it a lot easier to deal with
         | rendering a PDF, especially over pagebreaks ect.
        
           | Leoko wrote:
           | Yes, page breaks are probably the most significant difference
           | between the layout of a web page and a PDF document, and
           | thereby a major drawback when using HTML-to-PDF. There is
           | little to no tooling for this in the web.
           | 
           | If you want granular control over how your PDF will look with
           | content that is more than one page long, you will have a hard
           | time using html.
        
             | Titou325 wrote:
             | We actually provide helpers to do that in our React library
             | https://react.onedoclabs.com/components/shell#pagebreak
             | 
             | CSS actually implements the break-before property to
             | control this https://developer.mozilla.org/en-
             | US/docs/Web/CSS/break-befor... which is also supported by
             | the Print to PDF dialog in modern browsers.
        
             | pedro120 wrote:
             | That's what we are trying to solve at Onedoc, we want
             | developers to be able to have full control over the PDF
             | layout as they write content. react-print is built with the
             | intention of creating the illusion that React was meant for
             | PDFs.
        
       | ramon156 wrote:
       | Can we not have an alternative to PDFs? I get that they're more
       | standardized but why would everyone let adobe have the hammer for
       | a file type that's so important
        
         | Titou325 wrote:
         | We quite agree on this - but getting a new alternative out will
         | require a significant critical mass before it can be of any
         | interest. While PDF has its challenges, it remains a light
         | portable format and its security features make it a good fit
         | for binding documents. The ecosystem, although it is dominated
         | by Adobe, also includes other major players and existing
         | integrations.
         | 
         | The way we look at it is PDFs allows embedding of other files
         | and metadata. It is easy to provide a platform where we can
         | enrich PDFs to display different contents than the one in the
         | PDF itself. If this gets interesting enough, we can then phase
         | out the PDF in the first place. But this is a long way ahead.
        
         | nvr219 wrote:
         | Yeah let's give XPS another go.
        
           | devsda wrote:
           | Giving credit where it's due, I can appreciate Microsoft for
           | introducing XPS as an alternative to pdf.
           | 
           | There was a time, when not every software had "export to
           | pdf". So, having a "print to pdf" meant installing (often
           | pirated) Adobe Acrobat or installing a sketchy free(ware)
           | printdriver software downloaded from sourceforge.
           | 
           | MS adding xps print driver to windows enabled sharing docs
           | consistently (within windows ecosystem) without resorting to
           | hacks.
           | 
           | I don't know why it didn't catch up. May be it was the
           | general mistrust of anything MS, it arrived too late or it
           | was something else.
        
             | AugusteLef wrote:
             | Indeed, we need to give credit to MS for what they did.
             | However, it didn't catch up as you mentioned, maybe due to
             | timing, skepticism toward MS, or the complexity of moving
             | from Adobe to MS for PDF management. I will dig a bit into
             | it and come back later if I find anything interesting.
        
         | nradov wrote:
         | For supply chain workflows the ASC X12 Electronic Data
         | Interchange (EDI) industry standard works much better than
         | PDFs. Unfortunately, despite being around for decades in has
         | only been adopted by forward thinking organizations such as
         | Walmart. Most smaller companies and their vendors still haven't
         | implemented EDI.
         | 
         | https://developer.walmart.com/home/us-edi/
        
           | calvinmorrison wrote:
           | Insanity.
           | 
           | EDI is the only place where people are regularly still paying
           | for message by the kilobyte, where unsecured FTP over the
           | open internet is still a norm, and where entire cottage
           | industries exist to support AVOIDING using EDI.
           | 
           | Source: I work in EDI. it's a pain in the rump.
           | 
           | Also, EDI is really only good for things like PO's, shipping
           | notices, invoices, sales orders, etc.
        
             | oldandboring wrote:
             | > Also, EDI is really only good for things like PO's,
             | shipping notices, invoices, sales orders, etc.
             | 
             | Don't forget health insurance claims, eligibility &
             | benefits, and prior auth requests!
        
               | ochrist wrote:
               | EDI is used in a lot of situations for machine-to-machine
               | communications, but outside USA I believe EDIFACT is much
               | more used (X12 is mostly used in USA).
               | 
               | Today many EDIFACT documents have been converted to
               | ebXML: https://en.wikipedia.org/wiki/EbXML
               | 
               | Source: Worked in EDI for a few years
        
             | ochrist wrote:
             | You don't have to pay for message by the kilobyte. This is
             | only true if you use an external vendor for the conversion
             | or use a VAN for transmission:
             | https://en.wikipedia.org/wiki/Value-added_network
             | 
             | Source: Worked in EDI for a few years
        
               | calvinmorrison wrote:
               | you absoultely don't need to use a VAN, yet a LOT of
               | people do. Even when they're not using a VAN for comms,
               | they might pay a VAN to host their FTP. The whole thing
               | is backasswards.
        
         | breadwinner wrote:
         | PDF is an open format in the sense that you don't need to pay
         | Adobe a license fee for generating PDFs, or for reading and
         | rendering PDFs. The format is fully documented, although the
         | specification is controlled by Adobe.
        
         | rapatel0 wrote:
         | PDF is an incredibly (stupidly) extensible format. There are
         | tons of government forms that (sadly) bake in complex workflows
         | into PDF forms.
         | 
         | Given that the whole world has been running on PDFs for decades
         | it's makes more sense to leverage the existing infrastructure
         | and move it towards something more functional over time.
         | Introducing a new format will just lead to another format the
         | achieves 0.5% marketshare and then is abandoned after a few
         | years. Microsoft basically forcing people to use XPS in windows
         | (>70% market share of computing) still wasn't able to achieve
         | meaningful usage or change.
         | 
         | I expect that PDFs will not go away for 20 years at least, but
         | who knows
        
       | airbreather wrote:
       | are you doing this with pdfmarks?
        
         | AugusteLef wrote:
         | No, we don't currently do that. However, we are considering
         | adding metadata to PDFs, and using pdfmark could be very
         | helpful!
        
       | Crowberry wrote:
       | This looks really interesting! One of the main reasons we've
       | opted to writing a more complex rending code is for speed. We're
       | getting around 500ms for a single document, which is (last I
       | tested) quicker than any headless chrome setup.
       | 
       | How long does it take to render using your API? :)
        
         | pedro120 wrote:
         | Rendering time scales with the length / complexity of the
         | document. At the moment, our self-serve API renders slower than
         | a headless chrome setup. We are working on speeding this up as
         | it is currently in the order of seconds.
        
           | Crowberry wrote:
           | Alright, thanks!
        
       | dazh wrote:
       | Glad to see people building in the PDF space, which as a format
       | is unfortunately both awful and ubiquitous. Are you planning to
       | build any support for programmatically filling out existing PDF
       | forms? That's a huge pain point our product is facing that
       | doesn't seem easy to solve.
        
         | pedro120 wrote:
         | Yes, our focus is on programmatic interactions with PDFs, form
         | filling is on our roadmap, alongside programmatic digital
         | signature and many more.
        
           | dazh wrote:
           | Amazing, is there anywhere I can follow along to find out
           | when form filling will be available?
        
             | pedro120 wrote:
             | Sure! Feel free to join our Discord, we post announcements
             | as soon as new features are released. You can also ask for
             | features, we prioritise these requests with enterprise
             | customer's in our development roadmap.
        
         | wonger_ wrote:
         | I'm facing that same pain point of programmatic PDF filling. I
         | noodled around in the PDF format and learned it's a bit
         | difficult to deal with fonts and formatting. But I think this
         | client-side library works well enough, as a start: https://pdf-
         | lib.js.org/#fill-form
         | 
         | I've also heard of one paid API that I forgot but seemed to
         | work well, and this related service https://www.jotform.com/,
         | and I also considered porting some server-side libraries to
         | WASM. One day I'll collect all the libraries and findings in a
         | blog post.
         | 
         | Are you looking to programmatically fill any PDF form by
         | detecting the fields? Or are you filling one known PDF
         | template?
        
           | kodt wrote:
           | Years ago I needed to programmatically fill PDFs and used
           | this library to achieve it. Funny it has the same name as
           | what you linked: https://www.pdflib.com/
           | 
           | It is a paid commercial product however.
        
         | azmodeus wrote:
         | What are you looking for in programmatic pdf filling?
        
         | nip wrote:
         | For programmatic filling of PDFs, have a look at DocSpring:
         | https://docspring.com
        
       | cratermoon wrote:
       | The problem with using Tailwind is that I can't just say <h1>Some
       | Heading</h1>. As noted in the Tailwind documents "All heading
       | elements are completely unstyled by default, and have the same
       | font-size and font-weight as normal text."[1]
       | 
       | Most of the time when I'm writing HTML I want a set of default
       | styles for the most common elements, It's tedious and error-prone
       | to have to specify a class _every single time_.
       | 
       | 1 https://tailwindcss.com/docs/preflight
        
         | Titou325 wrote:
         | Makes total sense. There is no real requirement to use Tailwind
         | to create the PDFs, we just have grown accustomed to Tailwind
         | :) If you don't use the <Tailwind> tag, the browser defaults
         | are used to generate the PDF.
        
       | patrick4urcloud wrote:
       | very nice !
        
       | kvakkefly wrote:
       | Funny name! The reason I find it funny is I know some people who
       | made Doconce: https://github.com/doconce/doconce :D
        
         | esafak wrote:
         | Are they still developing it after the founder's passing?
        
       | marceldegraaf wrote:
       | We're using Gotenberg[1] to convert a rendered web page (with
       | Elixir/Phoenix, in our case) to PDF. Works like a charm and we
       | can use our existing frontend code/styling (including SVG graph
       | generators) which is a huge bonus.
       | 
       | 1: https://gotenberg.dev/
        
         | Titou325 wrote:
         | We actually experimented with Gotenberg! Ultimately it is a
         | layer on top of Chromium for conversion and we were
         | dissatisfied with the results. I am curious so as to how are
         | you handling assets and other static media / attachments: do
         | you embed everything in a single HTML file or do you use some
         | kind of bucketing system to resolve URLs?
        
           | marceldegraaf wrote:
           | Great question! We actually just use the static assets
           | (stylesheets, images) from our public asset CDN. The
           | generated HTML points to the latest version of those assets,
           | which means we can always use all the latest styling/assets
           | in our generated PDF files.
           | 
           | To give you an idea, this is the kind of PDF files we
           | generate that way:
           | https://assets.walterliving.com/documents/walter-
           | charlotte-d...
        
       | ak217 wrote:
       | FYI: the open source state of the art in this area is Playwright
       | (the successor to Puppeteer) with Paged.js
       | (https://pagedjs.org/). I highly recommend that everyone check
       | out and donate to paged.js, it's a fantastic project with lots to
       | like. It certainly blows commercial alternatives like Prince XML
       | out of the water.
       | 
       | That forms a solid foundation that I find it hard to imagine
       | paying for. The things where you might still command a premium
       | are basically safety mechanisms/CI checks/library components that
       | ensure the PDF renders correctly in the presence of variable-
       | length content, etc. as well as maybe PDF-specific features like
       | metadata and fillable forms. Naive ways to format headers,
       | footers, tables/grids/flexboxes etc. often fail in PDFs because
       | of unexpected layout complications. So having a methodology,
       | process, and validation system for ensuring that a mission
       | critical piece of information appears on a PDF in the presence of
       | these constraints could be attractive.
        
         | Titou325 wrote:
         | We are currently experimenting with this approach. A good thing
         | about paged.js is that we would be able to provide hot-reload
         | and live preview of files without actually converting to PDF.
         | 
         | Your second point is very interesting, seems like some kind of
         | .assert('text').isVisible() API. We may want to dig into that
         | further!
        
           | rudasn wrote:
           | Or maybe some visual diffing based on expected output, based
           | on the template/layout/theme used, since you'd want to
           | perform this check on every pdf generated in prod (that has
           | real, sensitive data) , not just in CI or testing mode, if
           | you're aiming for critical docs.
           | 
           | Cool project btw, congrats for the launch!
        
         | timvdalen wrote:
         | (How) does it handle CMYK and print PDFs? I see images of
         | printed books created by Paged.js, were these post-processed,
         | or printed using a printer that does a best-effort RGB
         | conversion?
        
           | ak217 wrote:
           | I'm not sure - we don't do color correction on our PDFs
           | because we don't have photos in them and color rendering is
           | not mission critical - but paged.js is focused on the concern
           | of layout for print media. I would imagine color rendering
           | can be solved orthogonally to what paged.js does for you, as
           | long as you specify the color data in CSS. I'm pretty sure
           | paged.js will pass it through without messing with it, so
           | you're good if the browser that Playwright/puppeteer is
           | driving supports the correct color profile when emitting the
           | PDF. I honestly don't know if browsers have sufficient
           | support for that when emitting a PDF, though.
           | 
           | Overall you're right that color correction is another area
           | where you could probably command a premium.
        
             | timvdalen wrote:
             | It's certainly an area with more depth than I anticipated
             | when I first started getting into it. Adobe is still pretty
             | much the only one that can get a PDF compliant with print
             | standards.
             | 
             | As far as I know, there's no way to currently get colors
             | adhering to print color profiles in CMYK out of browsers.
             | 
             | Indeed, if color correctness isn't mission critical, I can
             | imagine that going with Paged.js can be a nice experience!
             | 
             | (Edit: in my experience so far, it's been really really
             | hard to 'correct' colors from an existing PDF in a way that
             | gets a satisfying end result---the colors are usually
             | muted/washed out)
        
               | ak217 wrote:
               | I was curious and searched around and found this
               | presentation: https://www.w3.org/Graphics/Color/Workshop/
               | slides/Erias.pdf
               | 
               | You're right - although many of the building blocks are
               | there, it appears there is no way to specify a colorspace
               | or print profile when asking Chrome to emit a PDF (and I
               | doubt the other browsers are any better). Skia (the PDF
               | rendering engine that Chromium uses) actually supports
               | colorspace transforms, but Chromium doesn't seem to hook
               | that up to CSS or even support non-RGBA colors in its
               | rendering pipeline.
        
         | caesil wrote:
         | I think https://github.com/diegomura/react-pdf is closer to
         | what this company is doing.
         | 
         | In fact their open source library,
         | https://github.com/OnedocLabs/react-print-pdf, seems like a
         | higher-level library that sits above react-pdf. Reminds me a
         | lot of the set of react-pdf based components I built for a
         | corporate job where letting users create PDFs was a huge part
         | of the value proposition.
         | 
         | They're solving a really cool problem, actually, because
         | building out into certain difficult use cases like SVG support
         | was a huge pain.
        
       | breadwinner wrote:
       | How is this better than writing out an HTML file, then using
       | headless chrome to export to PDF, like this:
       | "C:\Program Files\Google\Chrome\Application\chrome.exe"
       | --headless --disable-gpu --print-to-pdf=C:\temp\foo.pdf --no-
       | margins --print-to-pdf-no-header C:\temp\test.mhtml
        
         | Titou325 wrote:
         | This brings its own set of challenges. Headers and footers are
         | strictly limited in terms of features, you cannot add
         | footnotes, the notion of page spreads is harder to implement.
         | Then you need to combine that with having a Chrome instance at
         | hand + exposing the needed assets for URL resolution.
         | Definitely not difficult let alone impossible, but not the
         | easiest way to get started :)
        
           | breadwinner wrote:
           | The easier way costs $0.05 cents per page. Imagine sending an
           | invoice to your customer and the invoice itself costs 5 cents
           | per page! That's prohibitively expensive for many
           | applications. I wouldn't consider any solution that costs
           | more than 1 cent per page.
        
             | Titou325 wrote:
             | We bill per document, so the number of pages wouldn't
             | impact the pricing. A 5 pages invoice would come at 1 cent
             | per page. However, it seems that each and every company has
             | different needs and the pricing may or may not make sense
             | for them. There are alternative billing options that we are
             | considering but we want to keep it easy to grasp rather
             | than go into billing kilobytes or ms of execution. We would
             | be more than happy to discuss use cases and see what can
             | work for each company :)
        
         | _puk wrote:
         | "you can already build such a system yourself quite trivially
         | by getting an FTP account, mounting it locally with curlftpfs,
         | and then using SVN or CVS on the mounted filesystem."
         | 
         | https://news.ycombinator.com/item?id=8863
        
       | ketanmaheshwari wrote:
       | I just wanted to add that if you want to convert plaintext files
       | to pdf, vim has a builtin feature to do so:                 vim
       | filename.txt -c "hardcopy > filename.ps | q" && ps2pdf
       | filename.ps #convert ps to pdf
        
       | ffpip wrote:
       | Love the demo on the homepage with the render button. Really
       | helps explain the product!
        
         | AugusteLef wrote:
         | Thanks! We try to make our product as accessible as possible
         | for anyone to use (or at least to test). It's good to hear that
         | our efforts have been worthwhile!
        
       | BrandiATMuhkuh wrote:
       | Congrats on the launch! What's the main advantage over pspdfkit?
        
         | Titou325 wrote:
         | It is similar to pspdfkit. We add an abstraction layer over the
         | HTML and assets hosting to make it easier to use without having
         | to think too hard about security and serving assets.
         | 
         | We also hope to keep the focus on the PDF generation part
         | rather than expanding super-horizontal style to provide all
         | imaginable PDF tools at the expense that none is really good.
        
       | gtirloni wrote:
       | I wonder what YC expects from such investments (considering the
       | multitude of FOSS solutions in this area).
        
         | Titou325 wrote:
         | While this may sound a bit counterintuitive (maybe?) we
         | actually pivoted to this field based on YC input and
         | discussions they have had with their previous companies. The
         | multitude of FOSS solutions in this area indicates this is a
         | real problem people are willing to spend time on, and yet there
         | is no go-to solution and every team we have talked to selected
         | different tools based on a very specific requirement.
         | 
         | This may not mean success, it means that game is not over in
         | the documents field :)
        
           | gtirloni wrote:
           | Thanks for the perspective. Indeed, this is an area with real
           | demand. I haven't evaluated YC's recent startups but I trust
           | they do know a bit about what has a better chance in the
           | market. Best of luck :)
           | 
           | ps.: As someone with very minimal PDF needs personally and at
           | work, I'd say the beautiful templates are what caught my
           | attention the most.
        
       | acoyfellow wrote:
       | Reminds me of BrewPDF.com
        
       | midenginedcoupe wrote:
       | I've also spent much longer than I'd like on this same problem.
       | Having a lightweight-enough service to convert html->pdf on the
       | fly, with good fidelity, and that can create an _accessible_ pdf
       | seems to be impossible.
       | 
       | If you can nail accessible PDFs then you'd open up a _very_ big
       | government market.
        
         | AugusteLef wrote:
         | We felt the same, and that's precisely why we built this tool!
         | The key, as you mentioned, is fidelity, especially for
         | designing complex layouts. We hope to bring something new and
         | valuable to the table. And yes, documents are central to many
         | industries including government, legal, banking etc.
        
           | dmazzoni wrote:
           | Can you directly answer whether your tool generates _tagged_
           | PDFs?
           | 
           | Of course, you can't guarantee that the resulting document is
           | 100% compliant because you can't enforce that the input is
           | valid, but are you at least outputting a complete tag tree
           | with as much semantics as possible given the input?
        
             | Titou325 wrote:
             | Yes, Onedoc generates tagged PDFs as long as you add a
             | `title` property to the API call to make the PDF UA/1
             | compliant.
        
       | admissionsguy wrote:
       | You cannot make this up, generating PDFs is now an enterprise
       | product.
        
         | dmazzoni wrote:
         | What do you mean "now"? It has been for years. It's a huge
         | business.
        
           | admissionsguy wrote:
           | When I was first hired 15 years ago my first task was to
           | create a PDF report. It was easy back then in PHP+fPDF. Two
           | years ago I was hired to work on a Heroku-hosted NodeJS app.
           | I was surprised to find that generating a PDF turned out to
           | be substantially more difficult task, requiring running a
           | browser emulator or connecting to an external service. And
           | now, seeing PDF generation as a premium pay-as-you-go product
           | is just too much.
        
             | AugusteLef wrote:
             | Makes sense. Actually, if you keep the layout/content very
             | simple, aren't constrained by throughput, and don't need to
             | integrate dynamic data or other similar processes, then
             | simple FOSS could indeed get the job done! That's exactly
             | why we developed the open-source library react-print-pdf
        
             | kodt wrote:
             | There have long been free HTML-PDF options around, but at
             | the same time commercial products that offered more
             | features. Some examples are opening an existing PDF and
             | modifying it, adding to it, merging PDFs, or filling PDF
             | forms.
        
         | AugusteLef wrote:
         | Editors such as Overleaf, and those offered by MS and Adobe,
         | have been around for a long time. Recently, companies like
         | Pandadoc and Docusign have started offering services around
         | PDFs (generation or other aspects of their lifecycle).
         | 
         | It might seem odd, given our long history with PDFs, but I
         | believe there's still much to be done with these documents.
         | They're everywhere--invoices, tickets, reports, etc.--yet the
         | technology for generating and managing them hasn't evolved much
         | in years. Our approach is to apply the same modern technologies
         | used for web design to document design.
        
       | anonymouse008 wrote:
       | Hmm interesting... I just went through this user experience on
       | iOS generating PDF invoices locally. I attempted the HTML > PDF
       | route, but Webkit is thorny wrt to layouts (as you mentioned). I
       | did settle in with drawing everything from the ground up > which
       | with LLMs wasn't as hairy as it used to be, even got a little
       | Swift framework out of the deal.
       | 
       | Am I understanding the docs correctly that you don't have a local
       | library available (the SDKs are just calling the APIs right?)?
       | Mind going through why you chose a remote API?
        
         | Titou325 wrote:
         | You are right in the sense we do not provide a local library.
         | We considered the option but would have brought a lot of
         | challenges to accommodate the various runtimes and device
         | capabilities.
         | 
         | This may come at a later stage once we have built our own
         | rendering engine though
        
       | kwhinnery wrote:
       | Looks awesome, will keep this in mind - every so often you need
       | to create complex documents in code, and it's always a pain.
       | Doing it with a familiar modern programming interface would be
       | nice.
        
         | AugusteLef wrote:
         | Exactly, that's one of the main reasons we began working on
         | this. We aim to bring the modern web technologies used for
         | website design into the document world. This includes enabling
         | the use of React and, of course, Tailwind, Chakra UI, etc.
        
       | canterburry wrote:
       | May I ask, why do we still need PDFs? I know they are still
       | popular, I just don't understand why.
        
         | Titou325 wrote:
         | There are many reasons behind it, to name a few: files are
         | self-contained(*) and easily portable, can guarantee some
         | security features, the format is easily extended, and the
         | ecosystem is very large.
         | 
         | It seems that a better format should exist, but the fact that
         | PDF is the de-facto for portable documents make it unlikely
         | things can change overnight.
        
       | petern81 wrote:
       | This is a good problem to tackle. The hours i've sunk...
        
         | AugusteLef wrote:
         | We spent many hours designing and generating PDFs at our
         | previous venture.. terrible experience. Which is why we're now
         | focused on solving this issue!
        
       | mstijak wrote:
       | Congratulations on the launch -- it looks fantastic! My company
       | is also developing a similar product. We've chosen to create a
       | visual report designer that enables end-users (non-developers) to
       | create and tweak PDF reports, and integrate with the existing IT
       | infrastructure via the API. Our experience is that users want
       | changes in reports very often and that it's best to allow them do
       | it on their own.
       | 
       | https://www.cx-reports.com
        
         | Titou325 wrote:
         | Really like your approach! We tried to keep things tied to code
         | as much as possible rather than dealing with complex interfaces
         | between changing inputs and outputs. Most legal and tech teams
         | we talked to pointed to the fact that CI/CD would quickly
         | become unbearable when decoupling documents and code
         | implementation. What is your approach on that?
        
           | mstijak wrote:
           | We offer comprehensive import/export functionality, ensuring
           | seamless transfer of reports between environments. Moreover,
           | workspaces allow you to segregate test and production
           | environments or create unique environments for each client,
           | allowing easy report customization. While reports are simply
           | JSON files, which could theoretically be stored on the file
           | system and checked in, doing so would hurt the flexibility
           | we're trying to achieve.
        
       | egnehots wrote:
       | The main issue is conflating templating and pdf generation.
       | 
       | Using html to pdf solutions allow to do the templating in html,
       | where it is pretty much a solved issue.
       | 
       | And as many said, headless chrome is a robust html to pdf
       | solution, even though it feel like a hack.
       | 
       | But, yeah, there seems to be a lack of awareness about these
       | options within corporations. So, kudos to you for addressing a
       | genuine problem!
        
         | pedro120 wrote:
         | Indeed, we aim at bundling this in a way that makes it easy and
         | obvious for enterprises to build their PDFs that way.
        
         | yencabulator wrote:
         | Typst is a typesetting language that makes programmatic layout
         | and processing JSON input pretty darn simple. I make invoices
         | by having a Typst template read line items from a JSON file.
         | 
         | https://github.com/typst/typst
        
           | adfaure wrote:
           | Just spent my Sunday creating my invoice template in typst as
           | well. I enjoyed it, and I could do what I wanted quickly!
        
         | plopz wrote:
         | The problem with chrome is the performance, it is very slow and
         | uses a bunch of memory. There was a neat post here awhile ago
         | about generating pdfs faster
         | https://news.ycombinator.com/item?id=39379690
        
           | AugusteLef wrote:
           | Indeed, speed is an issue (and it's hard to tackle).
           | Additionally, when using Chrome, what you see is not always
           | what you get. The layout often doesn't match expectations,
           | especially with complex elements. It's ok for simple use
           | cases, but for professional and scalable solutions, you
           | usually need to switch to something else!!
        
             | egnehots wrote:
             | Yes, that's a hard issue for arbitrary/user-provided HTML
             | pages. But with templates under your control, the context
             | is different. Your designers do not have to trust Chrome;
             | they can preview, tweak using print media queries, and
             | provide robust templates that print to acceptable PDF.
             | 
             | The speed issue is also real, but you can scale
             | horizontally by spawning other chromium instances (with
             | gotenberg containers for ex). Not really efficient but it
             | can certainly help alleviate most of the load...
        
               | AugusteLef wrote:
               | Absolutely! It's all about finding the right balance
               | between granting users complete control over the layout
               | and restricting it to ensure specific use cases through
               | templates. As for scaling, indeed, horizontal scaling is
               | an option, but there's a limit to its practicality. It
               | would be interesting to conduct a resource/time analysis
               | over the various solutions available on the market in
               | different situations/use cases
        
       | baggy_trough wrote:
       | This is definitely a somewhat painful process. I have done it
       | with puppeteer / chromium on Debian, and it works very well after
       | the headache of figuring it out. Having to pay 50 cents per PDF
       | and deal with a 3rd party vendor would not provide value for our
       | needs.
        
       | staffors wrote:
       | I see that you support page breaks and headers and footers and
       | stuff which is very cool. Is there some form of widow/orphan
       | control when text wraps from one page to the next? How do you
       | handle things like a large table that is longer than the length
       | of a page?
        
         | staffors wrote:
         | Also, do you different paper sizes (A4 and Letter)?
        
           | Titou325 wrote:
           | We support the size[1] property and the widows and orphans[2]
           | spec for both your needs :)
           | 
           | [1]: https://developer.mozilla.org/en-
           | US/docs/Web/CSS/@page/size [2]:
           | https://developer.mozilla.org/en-US/docs/Web/CSS/orphans
        
       | jjslocum3 wrote:
       | Does Onedoc retain any visibility into, or in any way use or
       | reserve the right to use any content created using its API in any
       | way? Obviously, calling an API means sending document contents to
       | Onedoc.
        
         | AugusteLef wrote:
         | We do not. We are also working on getting SOC2 compliant as
         | soon as possible. More about security here:
         | https://docs.onedoclabs.com/ressources/security (especially how
         | we use temporary buckets). Also, you can chose rather to host
         | you generated documents on our platform or to store it on your
         | local system.
         | 
         | But indeed, calling an API means sending documents contents to
         | Onedoc in a way or another. We aim to provide a self-hosted
         | solution in the future to solve this issue
        
       | rahhulk7 wrote:
       | This looks interesting! Especially the Markdown and LaTeX
       | components in react-print-pdf. Could be a great way to streamline
       | technical documentation generation in codebases. Would love to
       | see some examples of those in action.
        
         | AugusteLef wrote:
         | Indeed it could be a very interesting use case. While we are
         | more "Selling Shovels" it could be interesting to explore this
         | use case and maybe build a simple demo out of it!
         | 
         | And yes, as a big fan of LaTeX myself (I used to do all my
         | research reports on overleaf), we wanted to be able to
         | integrate formulas, code and more into your document very
         | simply. Glad you like it !
        
         | ska wrote:
         | FWIW I've had some good results for technical documentation in
         | RST markdown with sphinx for generation. You can develop latex
         | header details for detailed templating for pdf output, etc.
         | while keeping the html more simple if you want .
        
           | AugusteLef wrote:
           | Thanks for the tip, I will take a look at it asap
        
       | roastedpillows wrote:
       | The pricing is a little expensive. Have you heard of
       | https://htmldocs.com/ I've been using them for a year now and
       | it's free
        
       | travelinmyblood wrote:
       | First reaction - congrats guys, this is a problem I have in my
       | own business.
       | 
       | Second reaction - the pricing is way over the top and the model
       | is unusual. In your own pitch you talk about the volume of
       | documents created every day. How does that square with per
       | document pricing?
        
         | AugusteLef wrote:
         | Thanks! We're fine-tuning our pricing model and realize we have
         | some work to do in this area hahaha! Indeed, at a certain
         | scale, per-document pricing becomes almost impossible (we're
         | talking about millions of documents generated daily). As noted
         | in another comment, costs vary significantly depending on the
         | PDF type, from simple receipts to large multi-page reports,
         | especially since we currently rely on other proprietary
         | software that incurs high costs. In the future, we aim to offer
         | more than just document generation (like e-signature,
         | analytics, hosting, editor, etc.) and hope to move away from
         | "per document" pricing for high volumes. That said, our open-
         | source library allows anyone to design a document and use their
         | preferred renderer for PDF conversion, with all the pros and
         | cons each solution provides. There are more comments about
         | pricing providing additional information; feel free to dive in
         | if you have any comments or questions
        
       | jjmaestro wrote:
       | Just out of curiosity, as I've seen a few comments also
       | mentioning PrinceXML. Is OneDoc an API, wrapper, etc, on top of
       | PrinceXML? Or is it a completely new rendering engine?
       | 
       | Thanks!
        
         | AugusteLef wrote:
         | As of today we are building our solution on top of
         | PrinceXML/DocRaptor which is considered "to be the best PDF
         | generation API, giving complete control over the documents you
         | need to create" (cf. another comment). As we started working on
         | this solution less than 2 months ago, building our own renderer
         | was not an option. But once we have validated the idea, we are
         | definitely going to work on our own renderer to have 100%
         | control over the workflow, and also to be able to offer a
         | better pricing model!
        
       ___________________________________________________________________
       (page generated 2024-03-11 23:00 UTC)