[HN Gopher] Stirling-PDF: local web application to perform vario...
___________________________________________________________________
Stirling-PDF: local web application to perform various operations
on PDFs
Author : alexzeitler
Score : 155 points
Date : 2023-12-25 20:02 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| rodlette wrote:
| Nice. I've been looking for something like this to self-host, to
| avoid my partner uploading sensitive documents to random PDF
| manipulation websites.
|
| Any better alternatives I should be considering?
| Etheryte wrote:
| If you happen to be on macOS, the Preview app does an absurd
| number of things to PDFs, and it does it well. To be honest I'm
| always surprised it isn't highlighted more by Apple, it's a
| great tool that pretty much always just works. You can split
| files, join them, rotate, add signatures, drawings,
| annotations, redact sections, etc. The feature list is long,
| especially considering that by the name of the application
| you'd think it could just preview files, not edit them.
| cde-v wrote:
| Edge is surprising decent for marking up PDFs.
| somethingsome wrote:
| I often use PDF Sam (basic) and usually it works quite well and
| is offline.
|
| https://pdfsam.org/
| jftuga wrote:
| A really nice, stand-alone command line tool is pdfcpu.
|
| https://github.com/pdfcpu/pdfcpu
| bayindirh wrote:
| KDE's Okular. Works on Linux, Windows and macOS.
|
| If you're on already macOS, Preview already has you covered.
| christkv wrote:
| Can it add attachments to pdf files?. Until this year I did not
| even know that this was possible but a government agency asked me
| to add files as attachments to a pdf as their website only
| allowed uploading valid pdf files.
| alexzeitler wrote:
| Have learned about it this year as well
| layer8 wrote:
| You can use Acrobat Reader for that.
| rozman50 wrote:
| https://tools.pdf24.org/en/creator
|
| This tools is not open source, but it's free. Files should remain
| on local pc. Developers claim that they make money only by
| advertisement on their website.
| toasted-subs wrote:
| Self hosted sites are pretty awesome. Love seeing these here.
| zikohh wrote:
| I like using https://www.pdftool.org/en
| kleiba wrote:
| What about pdftk?
| laurensr wrote:
| What seems to be missing is an OSS tool to add/remove form fields
| adamnemecek wrote:
| I have not looked into this yet but can someone recommend an
| application for repairing pdfs? For example, I have PDFs where
| selecting text highlights a line above or below.
| me_jumper wrote:
| Try converting it to PDF/A
| adamnemecek wrote:
| That's not it.
| layer8 wrote:
| That doesn't sound like the PDF is broken, just that it uses
| unusual font metrics or line displacements. Tools that could
| amend this are unlikely to exist.
|
| More generally, the PDF format is too flexible to decide what
| is "broken" or really is as intended, in many cases. It's l a
| bit like asking for a tool that repairs "broken" source code
| where it's really just the business logic that is broken.
| RyanShook wrote:
| I have a PDF problem that I thought was simple but has proven
| difficult to solve and there is no paid solution I've found...
|
| I want to forward an email to an inbox, have the email body
| converted into a PDF, and then email that attachment to someone
| all automatically. I've tried Make, Zapier, pdf.co, pdftool, and
| a few other tools but have had no success. Has anyone solved this
| problem reliably?
| toomuchtodo wrote:
| https://news.ycombinator.com/item?id=38545255
|
| https://www.tapdone.com/
|
| perhaps? Or something similar?
| RyanShook wrote:
| Thanks for sharing!
| victorbojica wrote:
| If you are able to code or can ask someone, then you should be
| able to do it with some email api service (Nylas, AWS SES, etc)
| or headless client that gets the body of the email and convert
| it to pdf using wkhtmltopdf and then send it as attachment
| using the same service as before.
|
| Using low/no code tools might be very hard/unlikely
| RyanShook wrote:
| Thanks, yes I think this is the right direction. Surprised it
| doesn't exist as SAAS, I guess demand isn't there.
| rqtwteye wrote:
| I did something like this 10 years ago as an internal tool for
| a company. BAck then I did it with Outlook VBA.
| karl_gluck wrote:
| Google Apps Script can do all of this. Take the email body and
| put it into a Google doc, then export the doc as a pdf to drive
| and attach it from there to send.
| fgonzag wrote:
| It seems quite doable but you'd need scripting skills to set it
| all up. Read the incoming queue, pass it to wkhtmltopdf then
| pipe the result to the mail command. For windows I believe I
| once used a java smtp server (apache james) that allowed you to
| set custom code as an incoming email handler. After that the
| conversion and email sending is trivial.
| brailsafe wrote:
| Probably depends on the purpose of the pdf and why it needs to
| be an attachment, but I'd just skip all the steps and print the
| email since that's more or less what pdf is for. Print it and
| re-attach or just print at the destination.
| RyanShook wrote:
| This is what I currently do. I was just hoping to automate
| the process.
| karol wrote:
| Why can't this be an electron app?
| Froodle wrote:
| Dev here, totally could, we dismissed it at first as electron
| is quite bulky containing a whole chromium instance inside the
| exe. instead we kept it small as possible for the exe version
| We have plans for a full UI version in V2. We are releasing V1
| (SPDF is currently in beta) sometime this month. But have begun
| work on a V2 port to different language and framework.
| 101008 wrote:
| I still couldn't find a tool for a difficult problem to solve. I
| have some magazines in PDF, with layouts in two columns, etc. I
| want them to be transformed into Markdown. I know, it should
| identify automatically the two columns, different layouts, etc.
|
| I am not desiring something perfect - I can fix if ther are some
| errors, but so far nothing has come with a good result.
| layer8 wrote:
| This can be arbitrarily difficult to do, depending on the PDF.
| This is generally called PDF _reflowing_. Another approach is
| to use column-aware OCR software.
| qingcharles wrote:
| This is a hard problem. Cut the PDF down so it's only the pages
| of the article you want and then try feeding it through GPT Pro
| or Claude?
| jftuga wrote:
| Have you tried this (for at least solving part of the problem)?
|
| https://github.com/pdfcpu/pdfcpu
| lordofgibbons wrote:
| How easy or difficult would it be to turn this into an electron
| app so that non-technical users can use it easily too?
| layer8 wrote:
| Better use existing applications like PDFsam [0] or PDF-XChange
| [1].
|
| [0] https://pdfsam.org/pdfsam-basic/
|
| [1] https://pdf-xchange.eu/pdf-xchange-editor/
| Froodle wrote:
| Dev here, totally could, we dismissed it at first as electron
| is quite bulky containing a whole chromium instance inside the
| exe. instead we kept it small as possible for the exe version
| Truth is its not to hard to port to electron We have plans for
| a full UI version in V2. We are releasing V1 (SPDF is currently
| in beta) sometime this month. But have begun work on a V2 port
| to different language and framework.
| ziofill wrote:
| it says this started as a 100% chatGPT project!
| jwilk wrote:
| What does it mean?
| monospaced wrote:
| From my understanding they mean the code was generated by
| instructing OpenAI's ChatGPT (contrary to writing the code
| themselves).
| d4rkp4ttern wrote:
| I'll join some other commenters, to add my favorite difficult pdf
| problem that I haven't found a ready to use (even paid) solution
| for: extract key value pairs from a filled form such as this
| medical claims form:
|
| https://imgur.com/a/EJDi7L7
|
| There are two levels of difficulty: the starting file could be an
| image (pdf or png or jpg), which is the most difficult scenario.
| The slightly easier one is where it's a text-based pdf so no OCR
| is needed.
|
| I threw this as an image file at google form parser but it did
| poorly, I.e missed quite a few fields.
| Froodle wrote:
| Dev here for the above stirling pdf app, Please raise features
| like this as a feature request github issue ticket and we can
| try address it in future!
| Closi wrote:
| Have you tried Azure AI Document Intelligence?
|
| In theory it's exactly this...
| brianjking wrote:
| I second this, that or have you tried GPT-4 Vision or Donut?
| 11235813213455 wrote:
| Does it support adding / managing named form fields?
| Froodle wrote:
| dev here, Not currently but its a planned feature
| tobinfricke wrote:
| Probability density functions, presumably. Oh, partial
| differential equations?
|
| For the document files, I love PDF Studio:
| https://www.qoppa.com/pdfstudio/
| ObscureScience wrote:
| What I have mainly have been looking for in the free software
| ecosystem is a good tool to work with PDF
| tagging/structure/element attributes.
|
| At work I really have only been able to do the work I need on
| random PDFs with Adobe Acrobat. It seems strange that this is the
| case as PDF is now an open standard.
___________________________________________________________________
(page generated 2023-12-25 23:00 UTC)