Post AzJXGDxmp0YGEcLqTo by Canageek@wandering.shop
(DIR) More posts by Canageek@wandering.shop
(DIR) Post #AzJX2jidxRhafYpwoK by foone@digipres.club
2025-10-17T22:30:09Z
0 likes, 0 repeats
any suggestions for software to organize a pile of PDFs? like mostly manuals and datasheets. something that can automatically OCR them and have some kind of search index. preferably something we can spin up a local server of, for sharing across our network.
(DIR) Post #AzJXCmF5QnEdOIndjc by kmccoy@spacey.space
2025-10-17T22:31:16Z
0 likes, 0 repeats
@foone paperless-ngx
(DIR) Post #AzJXFJFtf0AaMHO0IK by shironeko@fedi.tesaguri.club
2025-10-17T22:32:58.819182Z
0 likes, 0 repeats
@foone paperless-ngx
(DIR) Post #AzJXGDxmp0YGEcLqTo by Canageek@wandering.shop
2025-10-17T22:32:05Z
0 likes, 0 repeats
@foone It can't do the OCR, but zotero can do most of the rest of that. I believe, it's got very good search and it's got some kind of network thing but I don't run my own server so I never investigated what it can do. and its entire purpose is to organize scientific papers but that's just a type of PDF. but I'm pretty sure I can do OCR so you'd have to do that another way ahead of time, sorry
(DIR) Post #AzJY2OZGhIlL1YqwSm by joriki@freeradical.zone
2025-10-17T22:41:14Z
0 likes, 0 repeats
@foone look at tools for data journalists, who have to pore over data dumps from large scale leakshttps://journaliststoolbox.ai/scraping-tools/other categories there may be of interest if you don't mind AI
(DIR) Post #AzJjmTYhZZIlJslODo by viraptor@cyberplace.social
2025-10-18T00:52:50Z
0 likes, 0 repeats
@fooneIt's not the original purpose for this software, but paperless should do the trick. https://docs.paperless-ngx.com/ It's normally used for correspondence, but it has built in OCR, tagging, categories and search (including full content)I've got longer contracts and manuals for home devices in it as well.
(DIR) Post #AzPFN1pZhC0yPJMgKW by joriki@freeradical.zone
2025-10-20T16:40:22Z
0 likes, 0 repeats
@foone edit: fixed URL; this one is an SPJ site, the other one might be a lookalike site to promote AI (not sure; can't be bothered to do more than correct the URL)look at tools for data journalists, who have to pore over data dumps from large scale leakshttps://www.journaliststoolbox.org/2023/05/21/find-scrape-and-clean-data/