[HN Gopher] Paperless-ngx - Open source document management system
___________________________________________________________________
Paperless-ngx - Open source document management system
Author : thunderbong
Score : 407 points
Date : 2023-10-07 11:55 UTC (11 hours ago)
(HTM) web link (nerdyarticles.com)
(TXT) w3m dump (nerdyarticles.com)
| nkrisc wrote:
| It finally happened to me: the very thing I just started
| researching and testing out showed up simultaneously at the top
| of the front page.
|
| I've found more good information about Paperless right here in
| the comments than anywhere else so far.
| eviks wrote:
| Is there any modern solution that doesn't tie you to the clunky
| interface of a single web browser client?
|
| While the folder organization criticism in the article is on
| point (although you could also use tags that many file systems
| support, but that's not a reliable system to invest time in, or
| maybe if it's backed up by some app that can restore all the
| tagging it could be), the range of native tools for
| viewing/editing various document formats as well as your ability
| to customize your workflows in unparalleled
| lucas_codes wrote:
| How do people usually backup their self-hosted docker services
| using postgres? I have been using docker-volume-backup [0] and
| just saving the postgres data directory, but I've found it
| requires a minute of downtime to backup properly.
|
| [0] https://github.com/offen/docker-volume-backup
| efrecon wrote:
| I have this: https://github.com/efrecon/pgbackup
| mastax wrote:
| ZFS snapshots
| darnir wrote:
| Specifically in the case of paperless-ngx, I use their export
| facility from a cron job. The export is plaintext and contains
| all the information needed to recreate the postgres db and the
| learned identifiers. In case of a disk failure (and I've had
| one with my paperless store), I just reimported the previous
| days backup from my offline backup of paperless' export.
| asmor wrote:
| restic container with all volumes mounted to
| /backup/<volumename> (and . to /backup/self - use named
| volumes, not binds) in my composefile with scale 0 and a
| backup.sh that's essentially
|
| docker compose down && docker compose run backup && docker
| compose up -d
|
| The restore procedure is the same, you restore the composefile
| through restic on the host and then `docker compose run backup
| restic restore latest --exclude "/data/self/*" --target /`
|
| I find it's fast enough because restic is incremental, but if
| you can set this up on a filesystem with snapshots that would
| be a great option too.
|
| Restic takes a bit of fiddling around too. I mount a prepared
| ssh config, a known hosts file and a private key.
| andix wrote:
| For now I only backuped some databases with a pg_dump one liner
| triggered from a cron job on the docker host (via docker exec
| or docker run --rm). No idea how this scales for big databases.
| But for your regular home server <10 GB databases this should
| just work.
| syntaxing wrote:
| I used vackup [1] that's been obsoleted but still works for me.
| However, you still need to turn of the container temporarily.
|
| [1] https://github.com/BretFisher/docker-vackup
| jhot wrote:
| docker-compose --env-file .env exec postgres /usr/bin/pg_dump
| -U postgres "$db_name" | gzip -9 >
| "$BACKUP_ROOT/postgres/${NOW}.${db_name}.sql.gz"
| poorlyknit wrote:
| pg_dump [0] (or pg_dumpall, linked there) sounds like what you
| want to use. You could docker exec into the postgres container,
| then copy the dump from the volume to your backup location on
| the host.
|
| A bit more contrived than copying the volume but you don't need
| to shut down the server. There's probably some scripts out
| there for doing this in a structured way but I usually do it
| more or less manually/use a bash script.
|
| [0]: https://www.postgresql.org/docs/current/app-pgdump.html
| kstrauser wrote:
| This is nifty, but seems to lack to one thing that keeps me
| coming back to DEVONthink: a learning classifier.
|
| With DT, say you've scanned or saved 20 docs to your inbox and
| you want to sort them to their long-term homes. DT will suggest
| folders based on how closely the new file matches the contents of
| those folders. It has the UI equivalent of "this looks like 2023
| state taxes. Is it? This looks like kid #2's school stuff. Is it?
| This looks like the older dog's veterinarian records. Is it?"
|
| That's so, so nice.
|
| Lately, as an experiment, I've been playing with organizing my
| docs with Johnny Decimal, then using the Hazel app to sort known
| docs with fixed structures (think bank statements and the like)
| into the right folders. My ScanSnap scanner's software does OCR,
| so by the time docs land in the inbox folder, they're ready for
| automated processing. It's working pretty well so far, and I may
| stick with it.
|
| But if I _were_ to go back to an app, it would be DEVONthink or
| something with most of its features. That classifier is too darn
| nice, plus its smart rules, plus its scriptability, plus multi-
| device sync, plus Markdown notes with wiki links to stored docs,
| plus a thousand other niceties.
| pydry wrote:
| I thought I wanted this originally when I first started going
| paperless but I quickly realized that as long as I OCR
| everything and throw it in a pile I can easily grep for "state
| taxes" and 2023.
| lolinder wrote:
| Paperless has this--when I upload a new file it will attempt to
| categorize it automatically using my existing tags. The more
| items I put in each tag the better it gets at categorizing
| them, so it definitely seems to be learning somehow, though I'm
| not sure on the details of how it works.
|
| I've never used DT, so it's possible that their system is
| substantially better in some way.
| Xerox9213 wrote:
| Paperless uses tags and will auto tag based on previous scans.
| IME it works very well (as long as you have a decently sized
| library of tagged documents) and seldom do I have to add my own
| tags. It's not perfect, though, and sometimes I have to go in
| and fix some of the tags.
|
| https://docs.paperless-ngx.com/advanced_usage/
| kstrauser wrote:
| Oh! Looks like I was wrong. Nice!
|
| I'd still miss DT's zillion other things I've used over the
| years, but that one would have been a dealbreaker.
| vr46 wrote:
| Previous conversation also here:
| https://news.ycombinator.com/item?id=37521492
| midnitewarrior wrote:
| From what I can tell, DT is only on Mac, and not open
| source. If the company goes under, good luck.
| steve1977 wrote:
| You can always export the files and you could also access
| them directly in the applications document database if
| needed.
| dgrabla wrote:
| My Paperless-ngx listening on a network share + brother ADS-2800W
| are key to stay sane. My only complain is that it is resource
| hungry. If I allocate less than 2G RAM to the paperless VM it
| does not work as it should.
| petepete wrote:
| I have this exact setup but with the ADS-4300N. I'm new to it
| and it's still a novelty.
|
| My only complaint is I've had the odd letter get scanned upside
| down and there's no way to rotate pages in Paperless-ngx.
| jplunien wrote:
| https://apps.apple.com/app/id6464425056
|
| Just recently started working on an iOS/macOS app for it. Hope
| you like it!
| Obscurity4340 wrote:
| How would you compare this to something like DevonThink, out of
| curiosity?
| bketelsen wrote:
| Nice, looks like you're headed in a good direction with this!
| apfsx wrote:
| This is great, nice work.
| ipsi wrote:
| I've spun up a copy of this recently (within the last month) and
| it's already proving helpful.
|
| I've purchased a new-build home in Germany, and I'm currently in
| the stage between "purchased" and "ready for move-in," and if
| you've ever purchased a Neubau in Germany you know how much
| paperwork is involved - I get so many documents over email, many
| of which are scanned (to preserve the wet signature and stamps),
| and some of which I need to copy into a translator, that this is
| incredibly helpful. It checks my email, grabs PDFs, straightens
| them, OCRs them, adds a correspondent, tags them, and makes them
| available through a web UI.
|
| I also appreciate the full-text search (for all that it might
| struggle if I had tens of thousands of documents) as I've had to
| go and try to find particular documents where the name of the
| document I've received might be a synonym for what the other
| person is asking for, but the word they're asking for is at least
| used in the text.
|
| I'll also set it up to pull documents from my NAS as well, where
| the scanner writes to, as I also receive a number of documents
| via mail (that I also occasionally need to translate or
| copy/paste from).
|
| There are also some limitations that annoy me:
|
| * I really wish the email filters were more flexible - right now,
| I have to have three filters, one of PDFs, one for JPEGs, and one
| for PNGs, so I wish I could just set a regex for the attachment
| name. This one annoys me enough that if I ever have time I'd look
| at doing a PR for it (assuming the filtering is done locally and
| not on the IMAP server). * I'd also like to be able to setup
| rules to tag documents based on the email domain (e.g., house-
| builders get tagged as "house-builder, house") without having to
| manage a gigantic explosion of rules. In theory the ML should
| handle that, but... I'm mistrustful of ML. We'll see in a few
| months if I was too hasty in my judgement or not. * I'd like to
| retain slightly more information about the correspondent, like
| both name and email address (there's no consistency about who has
| their From line as "Name <email>" and who's just "email", even
| within the same company), both for de-duplication of
| correspondents and domain-based searching. * I wish I could share
| documents more easily than downloading it and re-uploading it to
| my email client (or mounting the folders and trying to find the
| right document, but that has its own set of problems). This one
| of those problems that's really easy to state, but potentially
| quite difficult to actually implement - could a web application
| add a PDF to the clipboard in such a way that GMail, say, would
| understand what was happening and add it as an attachment when
| pasted?
|
| Overall though, I'm pretty happy with it, and finding it useful
| so quickly was somewhat surprising.
| jdoss wrote:
| If you are looking to quickly setup Paperless-NGX check out my
| little side project https://github.com/jdoss/ppngx. It will setup
| everything you need to run Paperless-NGX (PostgreSQL, Redis,
| Tika, Gotenberg, PaperlessNGX, and SFTPGo) inside a Podman Pod on
| a Linux based system. You can optionally set it up to start on
| boot via systemd.
|
| I run this locally on my workstation and send PDFs many times a
| week from Brother ADS2800w scanner via SFTP. Paperless NGX has
| reduced my home office paper piles to almost zero. It is a
| fantastic open source project and I am very thankful it exists.
| wolverine876 wrote:
| > everything you need to run Paperless-NGX (PostgreSQL, Redis,
| Tika, Gotenberg, PaperlessNGX, and SFTPGo)
|
| That is a lot of dependency. How stable is Paperless with all
| those applications making uncoordinated changes on their own
| schedules?
| darnir wrote:
| The only hard dependencies are Redis and Postgres. The
| official stance is to run them from the provided docket
| compose and the container for paperless-ngx itself is kept
| updated and working for the stable containers of redis and
| postgres.
|
| Tika and Gotenburg are additional features for scanning and
| converting MS Office documents to PDF. Not necessary and I
| don't use them in my setup at all. Same with sftpgo. I'm not
| sure for its usecase. But paperless doesn't directly depend
| on it in anyway.
| traverseda wrote:
| Why would you want to use this over one of the official docker
| compose setups? https://github.com/paperless-ngx/paperless-
| ngx/blob/main/doc...
|
| They will also automatically launch if you have docker running
| at boot. Is it just because you prefer redhat/IBM's docker
| equivalent stack to the much more common and cross platform
| docker install?
| jdoss wrote:
| I don't use Docker at all on any of my infra or workstations.
| That's why I made this.
| traverseda wrote:
| Alright, but you've sort of re-invented docker compose
| there, but as a shell script. These days docker compose
| even work with podman if you really prefer IBM's docker
| implementation to the original.
| efrecon wrote:
| Well... Maybe re-inventing was part of the fun or a
| learning experience. If you want, there is even this:
| https://github.com/Mitigram/docker-compose-build
| abacate wrote:
| I would want this over docker and docker-compose any day.
|
| I've been using docker compose in production for a couple of
| years now and it adds another layer on top of systemd that is
| a continuous source of headache, especially during updates.
|
| Podman gets it right: no central daemon, can automatically
| generate systemd services for a whole pod. Updates are
| seamless.
|
| This by itself is enough of a reason to me.
| growingkittens wrote:
| Paperless-NGX doesn't have document version history,
| unfortunately.
|
| Right now I am looking at OpenProDoc [1] and bitfarm-archiv [2]
| as document management possibilities.
|
| [1] http://jhierrot.github.io/openprodoc/Spec_EN.html
|
| [2] https://www.bitfarm-archiv.com/document-
| management/features....
| lobochrome wrote:
| I am just rcloning my paperless-ngx document volume to s3 deep
| glacier every night for this.
|
| It's a bit "scary" since even documents I delete in paperless-
| ngx are thus preserved forever, but it may come in handy
| someday.
| andix wrote:
| I'm looking for a suitable document management system for a
| while. There is one feature I would like to have, I didn't find
| anywhere except maybe in $$$ enterprise systems:
|
| I want to add custom metadata to documents by
| categories/tags/folders, for example like this:
| Invoice {issued: date, invoiceNumber: string, amount: number,
| due: date} Contract { validFrom: date, renewsAt: date,
| autoRenew: boolean}
|
| When adding a tag like this, it should either automatically fetch
| this information from the content document (probably very hard)
| or give you a manual workflow to type it into a form, while
| showing the document next to it. Maybe just by selecting the text
| from the PDF.
|
| In the folder list and in the search you would be able to add
| those meta data information as columns, sort them by value or do
| queries (tag:invoice AND invoice.amount > 1000)
|
| Edit: this feature seems to be one of most upvoted feature
| requests for paperless https://github.com/paperless-
| ngx/paperless-ngx/discussions/1...
| jamala1 wrote:
| Is Paperless suitable for business use, say, for a smallish sized
| company with 25 employees and 1000 customers. I think in my EU
| country such systems need to fulfill certain requirements like
| versioning/tracking of changes.
| ephimetheus wrote:
| Shameless plug: I recently released a native app for iOS that
| connects to Paperless-ngx:
|
| https://apps.apple.com/de/app/swift-paperless/id6448698521
| petergrace wrote:
| I use MayanEDMS personally, and have for the past five or so
| years. It's complex but does what it says on the tin.
|
| https://www.mayan-edms.com/
| growingkittens wrote:
| Mayan EDMS recently moved a lot of basic documentation behind a
| subscription paywall.
| saintradon wrote:
| I tinkered with this a few weeks ago. Pleasantly surprised with
| it's capabilities.
| lwhi wrote:
| This is very interesting to me.
|
| I'd love it if I could also use my mobile devices to bring up
| paper docs instantly (mobile phone, tablet, kindle).
| lobochrome wrote:
| There is even a nice Oss Swift app now in the app store. v1 but
| looks nice is fast and simple.
|
| https://apps.apple.com/app/id6448698521
| ephimetheus wrote:
| I made that! Glad you like it!
| diarrhea wrote:
| Easily possible. Paperless-ngx works great on mobile as well. I
| have WireGuard on my phone and connect that way, then simply
| use a mobile browser, no app needed.
| lwhi wrote:
| Nice!
| LeSaucy wrote:
| It's not free/oss, and it's on the Apple ecosystem, but
| DEVONTHINK does a fantastic job of this, and supports storing
| all of your documents in a webdav store which you can host
| yourself. It uses Aabbyy fine reader for ocr which I have found
| to provide better results than tensorflow based ocr.
| rufugee wrote:
| I've been using DEVONThink for just this for a few years, and
| it's very good at it. However, it's macOS only and has far
| more features than I need (simple searching, tagging, and
| organization). I tried paperless a year ago and the search
| and rendering was far too slow, and many docs just gave
| obscure errors. Perhaps it's time to give it another shot.
| I'd love to have something on Linux that could handle my
| large repository of documents.
| kristofferR wrote:
| Is this in reality a German cry for help, disguised as tech talk?
|
| As one of the least digitized countries in Europe, and the
| digitalization budget recently cut 99%, it seems like they still
| need to use paper in their lives, and it's not gonna improve
| soon.
|
| This feels so incredibly archaic to me as a Norwegian, I would
| have to print out documents to have anything to fill paperless-
| ngx with.
| _frkl wrote:
| You can just use your digital documents directly, and augment
| it with the few paper receipts that you might (or might not)
| still have to deal with. The main selling point is really
| document management (to me, anyway), the 'branding focus' on
| physical documents is probably a little misleading.
| greenicon wrote:
| You can easily use this for digital documents as well. The only
| difference in my setup is a tag showing whether the document id
| maps to a physical document in a binder or not.
| diarrhea wrote:
| I track, using tags, whether a document is a scan or properly
| digital. The pendulum is strongly in favor of the latter: I use
| this tool a ton for natively digital documents as well.
| Invoices, contracts, tickets etc. all come in as PDFs anyway,
| luckily. I have all that knowledge at the tip of my fingers.
| Yes, some of those documents are scans and used to be physical
| paper, but that's besides the point.
| rayshan wrote:
| Genuine question: for simple needs, why use this or DevonThink
| over macOS' built-in features? macOS now does OCR (Live Text),
| has tagging, and spotlight search is fast (but sometimes presents
| too many results to be useful). I even stopped splitting PDFs
| into separate documents and organizing them into folders. I just
| search.
| acka wrote:
| Obvious answer: because, contrary to popular belief, not
| everyone uses macOS.
| phodo wrote:
| Does auto OCR work on iCloud files ? For example: I scansnap a
| huge collection of documents to a folder that is on iCloud
| (synced w desktop). It works great because it is so simple.
| However if I have, say, PDF document, will the Mac ocr
| functionality perform the OCR if the doc is on iCloud and will
| I then be able to search for the text in that doc via spotlight
| / finder ? I tested this a few years ago and the search on
| content inside scanned PDFs did not work. I had looked at
| Paperless but decided to stay on Mac os file system.
| darkteflon wrote:
| Yeah. I had a Devonthink-based setup but after one too many
| database corruptions I threw in the towel. Now I just OCR scan
| everything into a few MacOS folders and search using Houdahspot
| (Spotlight, I found, was not suitable for fine-grained search).
| I'm very happy with the setup.
| ndsipa_pomu wrote:
| This is more designed for a self hosted server, so if you want
| multi-device web access then it's a great solution. I can
| download a PDF on my android phone and upload it to my
| paperless-ngx instance in a couple of clicks and easily edit
| the tags as necessary. It's great for travelling as you're not
| reliant on having a locally installed application on your
| chosen device with you, and of course it would still be
| available if you lost your main device and only had your phone
| on you.
| LVB wrote:
| I used to be the target audience and really enjoyed having my
| system just right, sorting and tagging everything, etc. But
| over the years I realized that I wasn't really benefiting much,
| and gave SwiftScan on my iPhone + dumping into and iCloud
| folder a try. For my needs, this has worked fine. It is rare I
| even need to refer to the scans, and the macOS OCR + automatic
| dates usually let me find the doc quickly. In the worst case I
| browse thumbnails.
| aetherspawn wrote:
| If anyone is looking for a fully-commercial version, we use
| something like this -- it is called Hubdoc and it is free with
| any Xero subscription.
|
| I really really appreciate the work that went into paperless, but
| for us the business risk of self-hosting this is far too high
| because if we lose our docs we lose our tax proof.
| xwowsersx wrote:
| I wonder if people know about Google's Stacks app? I don't know
| if it's as powerful as Paperless-Ngx, but it lets you organize
| docs pretty easily and some of it is automatic. I have "stacks"
| for insurance, id cards, receipts, medical records, etc. Whenever
| I get paper mail, I snap a photo and immediately toss it. I can
| then organize it in the Stacks app and easily be able to pull it
| up later. It's a pretty useful, easy solution IMO.
| swader999 wrote:
| Until they cancel it.
| xwowsersx wrote:
| True :(((
| yunohn wrote:
| I usually don't jump on the "Google cancels everything" train,
| but do keep in mind that Stacks is a project from their Area
| 120 incubator, which saw heavy layoffs [1]. It's not on the
| remaining list, so it may have already been cancelled
| internally and currently in the process of being shut down.
|
| [1] https://techcrunch.com/2023/01/25/google-spares-three-
| area-1...
| Eddy_Viscosity2 wrote:
| If it starts with 'google' then at best its something you try
| out then, if you like it, try and find that functionality in an
| app made by someone else. Google will kill this app just when
| you get fully invested. All google apps are traps and foot-
| guns, especially the ones that work great.
| xwowsersx wrote:
| Probably right
| navigate8310 wrote:
| Definitely scary as it's under their incubator area120
| hoppyhoppy2 wrote:
| I can't get it on my, ahem, _Google_ Pixel device running
| Android 13:
|
| > _This app isn 't available for your device because it was
| made for an older version of Android._
| dstroot wrote:
| Also:
|
| "Stack is only available on Android in the U.S. You can
| install it through the Google Play store."
| xwowsersx wrote:
| That's weird. I'm using it right now on the Pixel 7 Pro
| running Android 13.
| jeleh wrote:
| If you own a Synology NAS I recommend to have a look at synOCR:
|
| https://github.com/geimist/synOCR/wiki
|
| English translation: https://github-
| com.translate.goog/geimist/synOCR?_x_tr_sl=au...
|
| I've been using this for several years and it works great.
| JW_00000 wrote:
| What I don't really understand is, do people really have than
| many physical documents that they need to keep track of, that
| such a system is worth it? E.g. to file my taxes (in Belgium), I
| think I only ever need a few (maybe even only 1 or 2) digital
| documents. Or is this more a mentality thing? I know my parents
| have folders and folders, e.g. my father kept all expense notes
| from his work even after retirement... I throw everything away
| once it's handled.
| _frkl wrote:
| Can't speak for physical documents in general, but personally I
| really appreciate paperless-ngx for it's general document
| indexing/storage. Being able to scan and ocr physical documents
| (usually using the camera on my mobile phone) is very nice, but
| I mainly use it with pdfs that paperless automatically fetches,
| ocrs (if necessary), and tags from my email inbox, or which I
| copy into a specific local folder which gets synced with
| paperless.
|
| Getting all my invoices from last year to prepare taxes is now
| just a simple query in the paperless UI, the result would be
| about 95% digital and 5% physical documents, probably. Of
| course I could do all that old-school using filesystem folders,
| but having all my documents indexed and searchable in a single
| place was definitely worth the (small) effort of setting it all
| up and keep it running.
| kristofferR wrote:
| I don't understand what you mean with prepare taxes.
|
| I just add all purchases/sales right when they happen in my
| accounting app and attach the invoice PDF. Then when I have
| to file taxes, I export the correct numbers.
|
| Are you doing your bookkeeping in Excel or something?
| _frkl wrote:
| This is just for my personal taxes, no accounting involved.
| I just get all the relevant stuff together once a year. Of
| course it's not 10s of 100s of documents, but still enough
| so it would take me some time to get everything together
| manually.
|
| Also it was just meant as an example, paperless is
| generally useful (to me) in situations where I need to
| access somehow related documents, like traveling and such,
| or searching my documents for some information. As I said,
| there are other systems and ways to do this, but for me
| this is the one that stuck.
| NoboruWataya wrote:
| I'm quite paranoid about throwing stuff away so for me it's at
| least partly a mentality thing. I probably save a lot more than
| I need but it gives me piece of mind to know that it's all
| there. There are some things that it is very helpful to have
| easy access to, like utility bills and bank statements (which I
| occasionally need for KYC stuff) or ID documents.
| ipsi wrote:
| Kinda - at the moment I'm receiving _a lot_ of documents,
| mostly as PDFs via E-Mail (some the original digital version,
| some scans of physical copies), but some via post as well.
|
| I've only added documents I've received this year (plus a
| couple of dozen documents going further back), and I've got
| ~250 in there, with a total of ~2.5m words (although I think
| word-count is a fuzzy concept in German).
|
| I've posted a top level comment in more detail, but yeah, it's
| helpful to me.
| kstrauser wrote:
| I guess it's partly a mentality thing for me. I've had numerous
| cases of sadness that I couldn't produce a necessary document,
| and gladness that I was able to pull up something presumed long
| lost. For me, it's easier to save everything "just in case". It
| all adds up to less than 50GB so it's not an enormous amount of
| data to store by current standards.
|
| Seriously, a couple cases of "sorry, I don't have proof to back
| up that tax deduction" or "hey, here's the receipt proving that
| our TV is still covered by warranty!" make it all worthwhile.
| dividedbyzero wrote:
| Definitely, Germany strongly believes that a document that
| hasn't been a physical piece of paper at least once can't be
| real. That makes for folders upon folders of documents and it's
| actually worse than back in the 20th century because generating
| and mailing documents has become way easier and cheaper, so
| things that would have been a one-page typewritten letter back
| then now are five ten-page ones full of automatically generated
| crap. One lengthy illness in the family alone filled hundreds
| of pages and it can be very hard to know what can be thrown
| away at which point.
| schlowmo wrote:
| > Definitely, Germany strongly believes that a document that
| hasn't been a physical piece of paper at least once can't be
| real.
|
| I'm sorry to tell you that is a an oversimplification and
| especially for documenting expenses as a company/freelancer
| it's kind of worse.
|
| Last time I checked if you want to follow the tax law to the
| word you're not allowed to change the medium:
|
| If an invoice came as a paper copy (e.g. by snail mail), this
| paper copy is the original. If you scan it the digital
| version isn't.
|
| If an invoice came as a digital document (e.g. a PDF by
| email), this digital document is the original - a printed
| version of that digital document isn't.
|
| So if a tax inspector asks for "originals" it's technically
| almost impossible to provide them in the sense of the law. If
| even a tax inspector would care is another question.
| germanier wrote:
| It's perfectly legal (and common) for a decade now to scan
| documents and destroy the paper original as long as you
| follow some guidelines. Keyword is "ersetzendes Scannen".
|
| And yes, they care about those rules and that you provide
| "originals" according to that definition - in particular
| that you didn't modify digital documents in any way. You
| can (and should) comply with that and there are service
| providers to help if you are to small to set that up
| yourself.
| schlowmo wrote:
| Thanks, today I learned about "ersetzendes Scannen". I
| just checked and it's exactly a decade (2013) since it's
| allowed which coincidetally is the year when I started
| working as a freelancer (and I have to care about such
| rules).
|
| I admit that my last paragraph was kind of hyperbole, but
| I never heard (at least from other freelancers) of a tax
| inspector which wasn't happy with either everything
| printed or everything digital. I guess they really start
| to care if they suspect something fishy.
| noAnswer wrote:
| Another search/keyword is "Revisionssicher". If you
| storage/software has that, you a good to go.
| greenicon wrote:
| Just a side note to this and the other replies: You can
| also keep the original documents and add scans to paperless
| for indexing, etc. Since I switched to paperless I keep my
| originals in binders just ordered by the paperless id, so I
| can retrieve the original when required.
| ipsi wrote:
| Yeah, I'm also in Germany (although not German) and installed
| Paperless because of this!
|
| I think more than a few of these projects are started and/or
| maintained by Germans due to the astonishing number of
| documents received - e.g., paperless-ng appears to have been
| done by a German, although neither the original Paperless nor
| Paperless NGX immediately appear to be.
| esafak wrote:
| I would be in favor of not scanning them, forgetting about
| them, then throwing them away when I eventually see them again
| and deciding I did not miss them.
| whateveracct wrote:
| It's nice to throw papers away without worrying about it. Or to
| archive instruction manuals for stuff I own - paperless is the
| first place I look (its search is nice).
| krupan wrote:
| In my experience, no, you don't need this. The few things I
| keep just go in folders named for the year under my Documents
| folder, and they are given descriptive filenames like
| paystub-2022-10-15.pdf, or companyA-w-2.pdf. In the rare cases
| where I need to go back to those (like for a loan application
| or doing taxes) it's easy enough to find them.
| faiD9Eet wrote:
| You are right, you do not want to lookup documents that old, it
| is a waste of time... ... unless you are a German and the state
| asks for your time sheets three years in the past because
| you've gotten child support and are requested to prove your
| working hours. ... unless you happen to have an accident and
| your insurance is fighting with another insurance who's gonna
| pay and they ask you about the incident two years later ...
| unless you end up in a contract fight with the postal operator,
| that can take a year of mailing before being settled.
|
| Some correspondences take years and only add a mailing every
| few months. You would like to have a thread-like view -- as in
| an electronic mail. That is the strength of document management
| systems.
| djbusby wrote:
| I have a small business in USA. For federal business taxes I
| need 6-7 documents. Then that process creates other documents I
| need for personal taxes, which also requires 6-8 more
| documents. So, I'm roughly 20 important documents per year for
| federal taxes. Nexus in 3 states, adds more. And save them all
| for 7 years.
|
| The other end of the spectrum in USA is filling with the
| 1040-EZ which is like a 3-4 document process.
| catlover76 wrote:
| Seriously. People in this thread are describing some setups
| that momentarily seem cool in theory, but are almost certainly
| overkill for personal use.
| whateveracct wrote:
| Luckily, running paperless-ngx on my NixOS desktop is
| trivial. And it was also trivial to make it accessible over
| an avahi name on my local network. So it was kind of a "why
| not" sort of thing.
| yunohn wrote:
| In the Netherlands, government bodies are regularly pushing
| everything they can to a digital inbox - which I vastly prefer.
| My simple, single-employer yearly income tax is all pre-
| calculated. Further, deductions for mortgage interest,
| healthcare, studies, etc are all pre-filled as much as
| possible. I think you only need to upload documents for
| complicated sitations or audits?
|
| Of course, I still quickly download my year-end
| bank/salary/mortgage statements and cross-verify the tax
| departments numbers. The whole process takes at most a few
| hours.
|
| IME Germany has significantly more hard-copy requirements.
| t0mas88 wrote:
| You never need to upload the documents in the Netherlands,
| their software doesn't have such an option.
|
| But technically you're expected to keep the documents at
| least until you receive the "definitieve aanslag" and if
| you're nitpicking I think there is a 7 year term for the tax
| services to come back on your filed taxes and change things
| or demand proof.
|
| Practically that doesn't happen if you accepted their pre-
| filled numbers and they match your employers. But if you're a
| freelancer or other non-standard case I would keep digital
| copies for a few years just to be sure.
| yunohn wrote:
| > You never need to upload the documents in the
| Netherlands, their software doesn't have such an option.
|
| Ah, interesting. I just assumed my situation never
| triggered it.
| t0mas88 wrote:
| Depends on your tax situation. For my private taxes it's maybe
| 3 or 4 documents and those from the bank etc have all gone PDF
| anyway.
|
| But when I was a freelancer I used a document scanning system
| provided by my bookkeeper. It worked similar to this open
| source thing, scan to PDF, automatic OCR and classification.
| Needed it because many invoices still arrived on paper, and
| receipts for restaurants etc I usually took a picture to
| upload.
| kristofferR wrote:
| In some countries like Germany, the government still
| communicates with its citizens by snail mail. Important
| documents are usually physical there. They are one of the least
| developed countries in Europe with digitalization, they are far
| behind.
| [deleted]
| Macha wrote:
| So here's an example where it came in useful to have back
| documents:
|
| I recently purchased a house. As part of the process, I needed
| to apply for a mortgage. The bank wanted a statement from my
| employer about my income from them, along with my last 2
| complete years tax documents.
|
| The bank had an inquiry. My employer had said my salary + bonus
| was X, but in the first of these two years, my tax documents
| said my income from my employer that year was 2.5X. The extra
| 1.5X was due to the employer being bought out and some change
| of control terms in the RSUs causing immediate payout of what
| would normally have been paid out over 4 years. Since I kept
| the documents of the RSU terms and the payslips, I could
| provide these to the bank to clear the matter up.
|
| Notably, had I not kept my own copy of these documents, I could
| not have gone back to my employer for new copies. Due to the
| change of control, they had changed payroll vendors, and had
| eventually terminated the contract with the old vendor, so I
| could not have gotten a payslip from 1.5 years ago. Similarly,
| in the move to the new owner's HR system, the company had lost
| many of their records of agreements with employee's, including
| contracts etc., so it's not clear they would still have the
| terms of the RSUs, especially since the change of control
| payout rendered this a "completed" transaction. And later
| events made it clear that they did not have, e.g. a copy of my
| employment contract.
|
| Similarly, if I ever had had a dispute over the terms of those
| contracts - if I hadn't kept a copy of the contract, and the
| company definitely hadn't kept theirs, any dispute would have
| been my word against theirs.
| iamwpj wrote:
| Companies are legally required to keep payroll records for
| multiple years (depends on where you live, though I doubt
| most places are less than 3-4). This is ok advice, but these
| systems don't just work like this. If you didn't have the
| documentation the bank would likely take your approved tax
| filings as evidence and move on with their day.
|
| In a real contract dispute your copy of a contact from your
| documents isn't notably different in the eyes of the court
| than one from your employer. They're both notarized and if
| there's a dispute between them there is established
| processes. Aside from some titles or etc., historical filing
| ownership is typically relegated to the document originators.
| viraptor wrote:
| It's not just for physical documents. I have payslips which may
| be useful in the future, but are would be really hard to
| recover when I leave the company. Any invoices which come to my
| email. Any bank documents which exist in a vaguely named
| "account updates" email. And many other things which could be
| possible to find in the future, but are much better in
| paperless with appropriate tags and OCR.
|
| But yeah, then there are for example the bank account contract
| updates which come by physical mail only.
|
| > expense notes (...) I throw everything away once it's
| handled.
|
| Don't know about your location, but I need to keep the tax
| related documents for 5 years in case of an audit.
| abbbi wrote:
| using paperless for some months now and i really like it. Nice to
| see the project got some new contributors and frequent releases.
| noodlesUK wrote:
| One thing that I've done that makes my paper handling process
| much easier is have my printer/scanner point to a write only
| samba share. Most HP printers support this. I wrote a short
| script that looks for new files in there (with inotify), runs OCR
| on them with OCRMyPDF and moves them to a different file share.
| It means that my non-technical family members can just stick the
| paper in the document feeder, and 20 seconds later, an OCRed copy
| ends up on the family file share. You don't get the fancy tagging
| and search that this provides, but file shares integrate natively
| into all OSs, which is a huge perk.
| manuc66 wrote:
| People using HP printers with feature "Scan to Computer" are
| also using https://github.com/manuc66/node-hp-scan-to to send
| document to Paperless-ngx :
| https://www.reddit.com/r/selfhosted/comments/tethlr/hp_scan_...
| doubled112 wrote:
| I wanted to read the article, but it was incredible twitchy on
| my iPhone.
|
| I scan into a Samba share that paperless-ngx picks up
| automatically, OCRs, tags, and deletes.
|
| A web application is pretty cross platform too, at this point.
|
| Plus I can get to them on my phones with less trouble than a
| share.
| noodlesUK wrote:
| Yeah, I was looking at the docs for this and it looks like a
| somewhat more featureful version of what I've stuck together.
|
| How does it handle when you have digital documents you want
| to store (a la google drive or similar)?
| djhworld wrote:
| I've done something similar although I had to jump through a
| few hoops to get it to work.
|
| I have a Fujitsu ScanSnap which is one of those feed-through
| scanners. I have it hooked up to a Raspberry Pi which listens
| for the button press on the scanner. You press the button, the
| paper feeds through the scanner and once it has finished the
| scan a script runs to collate everything into a PDF and drops
| the result onto a Samba share that's running on the box where
| paperless-ngx is.
|
| It's pretty neat and feels seamless. The worst part was dealing
| with SANE and finding linux drivers for my scanner.
| godsfshrmn wrote:
| Do you have any other info on how to do this? I've looked for
| this but cannot find how to do
| alchemist1e9 wrote:
| I don't understand the Pi and button part. I also have a
| Fujitsu ScanSnap and just configure it to save to a Samba
| share.
|
| What does listen for button press mean? and how?
| djhworld wrote:
| I'm not sure how I would do that on my model (ScanSnap
| S1300i), it connects over USB and has no
| touchscreen/control interface or network port, or wifi
| capability, you have to connect it to a computer via USB.
|
| This works fine on say, a Mac, with the official Fujitsu
| ScanSnap software, and I'm guessing _that_ supports saving
| to a samba share, but I wanted a solution that's
|
| 1. completely headless, i.e. no desktop machine required
| and experience needs to be friction free as the headless
| part means the only way to interact with the scanning
| function is to press 1 button
|
| 2. linux compatible, as I wanted to connect it to a Pi. I
| had to dig for the drivers, Fujitsu didn't have the right
| ones for my model on their website!
|
| I couldn't find any official software from Fujitsu, but I
| found the drivers eventually, so ended up coming up with
| connecting the scanner to the Pi over USB and glueing the
| bits together to drop the PDFs onto the samba share
|
| The button is located on the scanner, and I run "scanbd"
| [1] to listen for the button press, this is what
| coordinates the scan function (feeding the paper through)
| and then post-scan -> running a script to collate + create
| PDFs
|
| [1] https://wiki.archlinux.org/title/Scanner_Button_Daemon
| [deleted]
| benbarbersmith wrote:
| If you have any notes on this, I've been wanting to set this
| up for ages and I'd be incredibly grateful!
| djhworld wrote:
| My solution was pretty much the same as what this guy did,
| although he had a slightly different model of scanner to
| me, but it's a very similar setup
|
| https://chrisschuld.com/2020/01/network-scanner-with-
| scansna...
| mirashii wrote:
| Paperless-ngx supports a folder on disk that you can drop
| files into and have them ingested. Throw in a samba
| container pointed at the same directory in your docker-
| compose and you've replicated the same setup.
| Osmose wrote:
| I've got this setup with a Brother ADS-1700W scanner, which
| can write directly to a network share over wifi. Paperless-
| ngx is running on my NAS which hosts the share as well.
| diarrhea wrote:
| I self host a couple things, but if I had to choose only one,
| it'd be this. So far the project strikes a great balance of
| stability (zero issues over two years now) and new features
| (ownership concept already available, allowing for multiple
| accounts in a pretty intuitive way).
|
| I've killed my instance twice now and had to restore from backup,
| which is also surprisingly pleasant to do. Their document
| exporter makes that possible. Having everything in a single JSON
| and otherwise just the raw PDFs makes a ton of sense and has me
| confident my documents are "just there" and moving to a different
| system would be feasible.
| Ylpertnodi wrote:
| >the project strikes a great balance of stability (zero issues
| over two years now)....
|
| >I've killed my instance twice now and had to restore from
| backup, which is also surprisingly pleasant to do
|
| Stable, but murderable?
| diarrhea wrote:
| Yep, it's not undying, but the murder happened at no fault of
| theirs. I'm taking credit for that one.
| AmazingTurtle wrote:
| I'm working on my own SaaS document management system that is
| easy-to-use, affordable and fully automated. Basically a black
| hole, throw a scan in or wait for emails to come it, it will
| name, tag and categorize it. It will also attempt to retrieve
| most important data such as invoice amount, customer numbers, so
| that you can easily distinguish and find the documents youre
| looking for. It comes with a chat feature so that you can ask
| things such as "what was my liability insurance number?" and
| it'll answer from the knowledge of your documents. I find this
| pretty useful, recently I was at an airport and forgot my flight
| number. I just asked what was my flight number and it retrieved
| that information from my recent documents easily. Integration
| with third party APIs and agnostic backend configuration for LLM
| and OCR is in progress. It works with Google Cloud Vision OCR and
| OpenAI at the moment.
| locustmostest wrote:
| We may want to get in touch with each other. We have an Open
| Core document management platform that runs in AWS; I'm not
| sure about your roadmap, but there may be something there
| that's of use: https://github.com/formkiq/formkiq-core
| AmazingTurtle wrote:
| Cool, I mean - that's a LOT of AWS services right there.
|
| But yeah, let's connect. Take a look at my project as well!
| https://turtledev.net/projects/refind-ai
| diarrhea wrote:
| Where can I sign up to track progress? This sounds like exactly
| the future I envisioned. I take great care manicuring my
| paperless instance such that when the day arrives, the LLM
| integration can work its magic best.
|
| That said, open source is absolutely table stakes in this, to
| me. From the documents I have in the system one could trivially
| impersonate me. Perhaps even as good as clone me. So sending
| all that off to random internet corporations, no can't do.
| hiAndrewQuinn wrote:
| That's unfortunately why I think Microsoft and Google are
| going to be the first ones to actually achieve this future.
| They're the only organizations well known enough that
| enterprise might trust them with this kind of thing.
| Jedd wrote:
| https://news.ycombinator.com/item?id=37702095
| AmazingTurtle wrote:
| I keep this site updated when something changes.
|
| https://turtledev.net/projects/refind-ai
| gsich wrote:
| My main gripe is that you can't use an existing folder structure.
| denysvitali wrote:
| I've created "ODI" (Overengineered Documents Indexer) and
| presented it recently.
|
| https://clis-everywhere.k8s.best/16
|
| My approach is scanning the documents with airscan1, indexing
| them with a custom OCR Server (using the MLKit by Google on an
| Android phone which does completely offline OCR scanning) and
| indexing everything in OpenSearch. I've then created a backend +
| frontend to see the documents and di full text search with that.
|
| Everything is (going to be) open source with a permissive
| license.
| [deleted]
| nvahalik wrote:
| I love seeing more Angular projects in the wild like this.
|
| Angular is an under-appreciated, solid, no-gimmicks framework.
| Been using it for years rather than React and it seems the the
| pendulum is swinging back toward "this side" now.
| frde wrote:
| Looking through the setup, this seems like an insane way to
| package an application for users to install:
| https://docs.paperless-ngx.com/setup
|
| The documentation itself is so full of implementation details
| that, as someone who is interested in the concept of this, I'm
| scared off even trying to setup and use this
|
| The project would be much more approachable if there was a simple
| native installer. My parents could also benefit from this but
| there's no way they would ever even understand how to install
| this, much less troubleshoot docker things.
| switch007 wrote:
| It doesn't look like the project goals include being
| installable by your parents
|
| It looks to sit in the self hosted space that has an admin
| manage all the sysadmin tasks. They've provided docker which is
| a pretty good step.
|
| There are desktop apps designed at the single user/less
| experienced user, which might be more suitable
| starkparker wrote:
| You might want Recoll[1]. Similar if less powerful
| capabilities, cross-platform, open source, has Windows and
| macOS installers.
|
| Still an overly complex FOSS user interface for a tech-unsavvy
| target with lots of digging around to configure it (OCR setup,
| for instance[2]), but at least you don't need to know what
| Docker is to install it.
|
| 1: https://www.lesbonscomptes.com/recoll/
|
| 2:
| https://www.lesbonscomptes.com/recoll/usermanual/webhelp/doc...
| ndsipa_pomu wrote:
| Self-hosting services usually entails more technical knowledge
| than just installing an app and I don't think a document
| management system would necessarily work well as a native
| application. For starters, there's the backup issue and you
| wouldn't want non-technical people to store important documents
| that only live on a local drive. Remote web access is also a
| very useful feature for when travelling and that wouldn't be
| easy to setup for a local install.
|
| I've been using it for over a year and am very happy with it,
| though I intend on moving it from my home Pi docker swarm onto
| a free Oracle cloud instance to improve the performance and
| uptime (I've got my Pis auto updating and rebooting, so
| services get shunted around fairly often).
| tmerse wrote:
| _The project would be much more approachable if there was a
| simple native installer_
|
| Actually the very first example on https://docs.paperless-
| ngx.com/setup lists an interactive installer which asks the
| user some question and eventually arrives at a working docker-
| compose setup. $ bash -c "$(curl -L
| https://raw.githubusercontent.com/paperless-ngx/paperless-
| ngx/main/install-paperless-ngx.sh)"
|
| If you ask me, this is already pretty user friendly. Although I
| agree that if your needs are more involved, there is some
| reading you'll have to do.
|
| I am currently in the process of migrating from mayan-edms to
| paperless-ngx and it feels pretty approachable to me if you
| know your way around docker (compose).
| preya2k wrote:
| It is designed to be a server application, so it'd be very
| difficult to offer a desktop-like app experience, that's easier
| to install.
| bettercallsalad wrote:
| Is it using local storage or cloud?
| ndsipa_pomu wrote:
| Yes.
|
| It's a self-hosted application, so it depends on your setup. I
| suppose it's arguably using local storage on the server you run
| it on which is often going to be a cloud hosted machine.
| beestripes wrote:
| Does it have annotation capabilities? Quickly adding a checkmark
| or signature would make managing documents much easier.
| ndsipa_pomu wrote:
| It looks like it does, though I've never wanted to use them. I
| just had a quick look at my instance and you can add text notes
| alongside the document and also there's some basic editing
| draw/text tools to add to the document itself.
___________________________________________________________________
(page generated 2023-10-07 23:00 UTC)