[HN Gopher] Stirling-PDF: local web application to perform vario...
       ___________________________________________________________________
        
       Stirling-PDF: local web application to perform various operations
       on PDFs
        
       Author : alexzeitler
       Score  : 155 points
       Date   : 2023-12-25 20:02 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | rodlette wrote:
       | Nice. I've been looking for something like this to self-host, to
       | avoid my partner uploading sensitive documents to random PDF
       | manipulation websites.
       | 
       | Any better alternatives I should be considering?
        
         | Etheryte wrote:
         | If you happen to be on macOS, the Preview app does an absurd
         | number of things to PDFs, and it does it well. To be honest I'm
         | always surprised it isn't highlighted more by Apple, it's a
         | great tool that pretty much always just works. You can split
         | files, join them, rotate, add signatures, drawings,
         | annotations, redact sections, etc. The feature list is long,
         | especially considering that by the name of the application
         | you'd think it could just preview files, not edit them.
        
         | cde-v wrote:
         | Edge is surprising decent for marking up PDFs.
        
         | somethingsome wrote:
         | I often use PDF Sam (basic) and usually it works quite well and
         | is offline.
         | 
         | https://pdfsam.org/
        
         | jftuga wrote:
         | A really nice, stand-alone command line tool is pdfcpu.
         | 
         | https://github.com/pdfcpu/pdfcpu
        
         | bayindirh wrote:
         | KDE's Okular. Works on Linux, Windows and macOS.
         | 
         | If you're on already macOS, Preview already has you covered.
        
       | christkv wrote:
       | Can it add attachments to pdf files?. Until this year I did not
       | even know that this was possible but a government agency asked me
       | to add files as attachments to a pdf as their website only
       | allowed uploading valid pdf files.
        
         | alexzeitler wrote:
         | Have learned about it this year as well
        
         | layer8 wrote:
         | You can use Acrobat Reader for that.
        
       | rozman50 wrote:
       | https://tools.pdf24.org/en/creator
       | 
       | This tools is not open source, but it's free. Files should remain
       | on local pc. Developers claim that they make money only by
       | advertisement on their website.
        
       | toasted-subs wrote:
       | Self hosted sites are pretty awesome. Love seeing these here.
        
       | zikohh wrote:
       | I like using https://www.pdftool.org/en
        
         | kleiba wrote:
         | What about pdftk?
        
       | laurensr wrote:
       | What seems to be missing is an OSS tool to add/remove form fields
        
       | adamnemecek wrote:
       | I have not looked into this yet but can someone recommend an
       | application for repairing pdfs? For example, I have PDFs where
       | selecting text highlights a line above or below.
        
         | me_jumper wrote:
         | Try converting it to PDF/A
        
           | adamnemecek wrote:
           | That's not it.
        
         | layer8 wrote:
         | That doesn't sound like the PDF is broken, just that it uses
         | unusual font metrics or line displacements. Tools that could
         | amend this are unlikely to exist.
         | 
         | More generally, the PDF format is too flexible to decide what
         | is "broken" or really is as intended, in many cases. It's l a
         | bit like asking for a tool that repairs "broken" source code
         | where it's really just the business logic that is broken.
        
       | RyanShook wrote:
       | I have a PDF problem that I thought was simple but has proven
       | difficult to solve and there is no paid solution I've found...
       | 
       | I want to forward an email to an inbox, have the email body
       | converted into a PDF, and then email that attachment to someone
       | all automatically. I've tried Make, Zapier, pdf.co, pdftool, and
       | a few other tools but have had no success. Has anyone solved this
       | problem reliably?
        
         | toomuchtodo wrote:
         | https://news.ycombinator.com/item?id=38545255
         | 
         | https://www.tapdone.com/
         | 
         | perhaps? Or something similar?
        
           | RyanShook wrote:
           | Thanks for sharing!
        
         | victorbojica wrote:
         | If you are able to code or can ask someone, then you should be
         | able to do it with some email api service (Nylas, AWS SES, etc)
         | or headless client that gets the body of the email and convert
         | it to pdf using wkhtmltopdf and then send it as attachment
         | using the same service as before.
         | 
         | Using low/no code tools might be very hard/unlikely
        
           | RyanShook wrote:
           | Thanks, yes I think this is the right direction. Surprised it
           | doesn't exist as SAAS, I guess demand isn't there.
        
         | rqtwteye wrote:
         | I did something like this 10 years ago as an internal tool for
         | a company. BAck then I did it with Outlook VBA.
        
         | karl_gluck wrote:
         | Google Apps Script can do all of this. Take the email body and
         | put it into a Google doc, then export the doc as a pdf to drive
         | and attach it from there to send.
        
         | fgonzag wrote:
         | It seems quite doable but you'd need scripting skills to set it
         | all up. Read the incoming queue, pass it to wkhtmltopdf then
         | pipe the result to the mail command. For windows I believe I
         | once used a java smtp server (apache james) that allowed you to
         | set custom code as an incoming email handler. After that the
         | conversion and email sending is trivial.
        
         | brailsafe wrote:
         | Probably depends on the purpose of the pdf and why it needs to
         | be an attachment, but I'd just skip all the steps and print the
         | email since that's more or less what pdf is for. Print it and
         | re-attach or just print at the destination.
        
           | RyanShook wrote:
           | This is what I currently do. I was just hoping to automate
           | the process.
        
       | karol wrote:
       | Why can't this be an electron app?
        
         | Froodle wrote:
         | Dev here, totally could, we dismissed it at first as electron
         | is quite bulky containing a whole chromium instance inside the
         | exe. instead we kept it small as possible for the exe version
         | We have plans for a full UI version in V2. We are releasing V1
         | (SPDF is currently in beta) sometime this month. But have begun
         | work on a V2 port to different language and framework.
        
       | 101008 wrote:
       | I still couldn't find a tool for a difficult problem to solve. I
       | have some magazines in PDF, with layouts in two columns, etc. I
       | want them to be transformed into Markdown. I know, it should
       | identify automatically the two columns, different layouts, etc.
       | 
       | I am not desiring something perfect - I can fix if ther are some
       | errors, but so far nothing has come with a good result.
        
         | layer8 wrote:
         | This can be arbitrarily difficult to do, depending on the PDF.
         | This is generally called PDF _reflowing_. Another approach is
         | to use column-aware OCR software.
        
         | qingcharles wrote:
         | This is a hard problem. Cut the PDF down so it's only the pages
         | of the article you want and then try feeding it through GPT Pro
         | or Claude?
        
         | jftuga wrote:
         | Have you tried this (for at least solving part of the problem)?
         | 
         | https://github.com/pdfcpu/pdfcpu
        
       | lordofgibbons wrote:
       | How easy or difficult would it be to turn this into an electron
       | app so that non-technical users can use it easily too?
        
         | layer8 wrote:
         | Better use existing applications like PDFsam [0] or PDF-XChange
         | [1].
         | 
         | [0] https://pdfsam.org/pdfsam-basic/
         | 
         | [1] https://pdf-xchange.eu/pdf-xchange-editor/
        
         | Froodle wrote:
         | Dev here, totally could, we dismissed it at first as electron
         | is quite bulky containing a whole chromium instance inside the
         | exe. instead we kept it small as possible for the exe version
         | Truth is its not to hard to port to electron We have plans for
         | a full UI version in V2. We are releasing V1 (SPDF is currently
         | in beta) sometime this month. But have begun work on a V2 port
         | to different language and framework.
        
       | ziofill wrote:
       | it says this started as a 100% chatGPT project!
        
         | jwilk wrote:
         | What does it mean?
        
           | monospaced wrote:
           | From my understanding they mean the code was generated by
           | instructing OpenAI's ChatGPT (contrary to writing the code
           | themselves).
        
       | d4rkp4ttern wrote:
       | I'll join some other commenters, to add my favorite difficult pdf
       | problem that I haven't found a ready to use (even paid) solution
       | for: extract key value pairs from a filled form such as this
       | medical claims form:
       | 
       | https://imgur.com/a/EJDi7L7
       | 
       | There are two levels of difficulty: the starting file could be an
       | image (pdf or png or jpg), which is the most difficult scenario.
       | The slightly easier one is where it's a text-based pdf so no OCR
       | is needed.
       | 
       | I threw this as an image file at google form parser but it did
       | poorly, I.e missed quite a few fields.
        
         | Froodle wrote:
         | Dev here for the above stirling pdf app, Please raise features
         | like this as a feature request github issue ticket and we can
         | try address it in future!
        
         | Closi wrote:
         | Have you tried Azure AI Document Intelligence?
         | 
         | In theory it's exactly this...
        
           | brianjking wrote:
           | I second this, that or have you tried GPT-4 Vision or Donut?
        
       | 11235813213455 wrote:
       | Does it support adding / managing named form fields?
        
         | Froodle wrote:
         | dev here, Not currently but its a planned feature
        
       | tobinfricke wrote:
       | Probability density functions, presumably. Oh, partial
       | differential equations?
       | 
       | For the document files, I love PDF Studio:
       | https://www.qoppa.com/pdfstudio/
        
       | ObscureScience wrote:
       | What I have mainly have been looking for in the free software
       | ecosystem is a good tool to work with PDF
       | tagging/structure/element attributes.
       | 
       | At work I really have only been able to do the work I need on
       | random PDFs with Adobe Acrobat. It seems strange that this is the
       | case as PDF is now an open standard.
        
       ___________________________________________________________________
       (page generated 2023-12-25 23:00 UTC)