[HN Gopher] Linear Book Scanner - Open-source automatic book sca...
___________________________________________________________________
Linear Book Scanner - Open-source automatic book scanner (2014)
Author : gorenb
Score : 288 points
Date : 2023-09-17 13:03 UTC (9 hours ago)
(HTM) web link (linearbookscanner.org)
(TXT) w3m dump (linearbookscanner.org)
| hinnisdael wrote:
| There's a somewhat similar commercial scanner [1] [2], with a
| V-design as well but inverted to scan from the top. Much gentler
| on the books as it's the scanner that moves, not the books
| themselves. Super happy to see someone develop an open-source
| alternative!
|
| [1] https://www.treventus.com [2]
| https://youtu.be/SdipuAuWsEs?si=dFWRtva5gO2oM91o
| corndoge wrote:
| Which is discussed in the Google talk associated with the OP
| project here:
|
| https://youtu.be/4JuoOaL11bw?si=do1qet5Kq_WErQgz&t=162
| webnrrd2k wrote:
| Um,I'm not sure this is a great alternative, as this scanner
| chops out a page for every scan. After that, I don't think it
| really matters how gentle it is...
|
| Unless you're concerned with binding the pages to form a new
| book. I think that would be possible with the leftovers.
| codetrotter wrote:
| I think you are mistaken. Neither of the machines appear to
| chop pages out.
| webnrrd2k wrote:
| You're right! My mistake. It looked like there was a sort
| of deli-slicer blade and suction to remove pages, but,
| looking again, it's not what's happening. That's what I get
| for posting pre morning coffee...
| codetrotter wrote:
| It's probably expensive even to lease/rent, or to use their
| digitization service. Why? Because they have no pricing info
| available for those two options either. Just an inquiry form to
| fill out.
| hinnisdael wrote:
| Right, no public pricing usually means rates far above what
| personal or low-budget projects can afford.
|
| Not sure how much of the design is protected and how much
| inspiration one can take for a non-commercial DIY project
| like the one presented by OP.
| pontifier wrote:
| I remember contacting them about 10 years ago... Can
| confirm it was way out of my price range.
| fuzzythinker wrote:
| A DIY of the same idea, as linked by others:
| https://www.diybookscanner.org
| Nzen wrote:
| tl;dr a diy system for scanning books. Basically, build a
| triangular prism with a special zig zag slit for a single page to
| snake through. Put a vacuum on each side to pull the paper
| through the slit. Put two optical scanners in the middle of the
| slit, to scan both sides of the sheet as it hangs down. Attach a
| motor to move a sled pushes just the top, then bottom, of the
| book. The site features six designs. Some include videos of
| operation [0].
|
| [0] https://www.youtube.com/watch?v=84byulcC6i4 30 seconds long
|
| In the fullness of time, maybe I would make one of these, given
| that I live in an apartment and not a house with space to
| construct/store this scanner. I check etsy every couple of years
| and haven't seen someone offer a kit. I use 1dollarscan, though
| they've had to restrict their offering as Pearson, et al notice
| their existence.
| tohnjitor wrote:
| Fantastic concept! I agree with the concerns about damaged pages.
| Perhaps this is something that could be easily improved.
| billy_bitchtits wrote:
| just saw the binding off and push it through a scansnap or the
| like
| rychco wrote:
| I love the idea, although the risk of torn pages is mildly
| concerning for archival purposes or valuable books. Though if
| that were the case, I'm sure scanning by hand would be preferred
| anyway. I've often wanted a device like this for the purpose of
| digitizing my excessively large collection of books.
|
| Regarding frequency of torn pages in the FAQ:
|
| > Prototype 1 could scan the majority of books without damage,
| but may tear one or two pages in some books. Out of 50 books
| tested, 45% had one or two of their pages either torn or folded.
| This is a very early prototype and there are many areas for
| improvement in the design.
|
| In my opinion, this is mostly acceptable. Especially if a future
| revision reduces the 45% to somewhere around the ~10-20% range.
| If I had the space for a device like this, I would definitely
| consider building one.
| ChristianGeek wrote:
| If you haven't already seen this site, it's well worth a visit:
|
| https://www.diybookscanner.org/en/intro.html
| mcshicks wrote:
| I made the cardboard scanner many years ago both to scan
| whole books as well as sections of books from the library. It
| worked pretty well but nowadays in practice I usually look
| into the open library at the Internet archive first. Hope it
| sticks around.
| rychco wrote:
| Great link, thanks!
| gcr wrote:
| For a while, the Internet Archive built a book scanner that
| rests the book in a V-shaped cradle. A volunteer turns pages by
| hand and lowers/raises a pair of glass panes that gently press
| the pages for imaging by a pair of DSLR cameras on an angled
| mount. The whole assembly isn't automatic, but can be easily
| operated by hand.
|
| https://blog.archive.org/2021/02/09/meet-eliza-zhang-book-sc...
| pimlottc wrote:
| You said "for a while"; are they not using these machines
| anymore?
| GuestHNUser wrote:
| I imagine lawsuits like this[0] caused them to slow down
| unfortunately.
|
| [0] https://en.m.wikipedia.org/wiki/Authors_Guild,_Inc._v._
| Googl....
| userbinator wrote:
| I'm not sure if one of those machines was the cause, but I've
| seen far too many old books on archive.org which have pages
| that appear to have been torn by the scanner; thus I doubt
| they're manually doing it.
| asdefghyk wrote:
| Interestingly the Internet Archive copied the open source
| design (of their scanner) from the site
| https://diybookscanner.org/ ( as they are allowed to by its
| open source licence ) The internet Archive then effectively
| refused to release any details back to the community. After a
| lot of "pushing", the Internet Archive did acknowledge the
| source their design was based on came from the site
| bookscanner.org. This would have been very disappointing, in
| my opinion for the designer of the scanner - who released the
| info open source. At one time Internet Archive sold this
| scanner to organizations for $10K I think the price has
| dropped now ( I think ) to a few thousand.
| zoklet-enjoyer wrote:
| This looks a lot safer
|
| https://www.inforum.com/newsmd/ndsu-students-book-scanner-in...
|
| http://diybookscanner.org
| FinnKuhn wrote:
| While they are certainly safer they aren't automatic and
| require someone to turn the pages, which this project attempts
| to solve, although it certainly should be done in a more gently
| manner as to not damage the books.
| Syzygies wrote:
| This is a problem domain where software hasn't caught up with
| what is possible, so people do in hardware what could be done in
| software.
|
| With two or more photos or a stereo image (new iPhone?) one could
| triangulate to infer a flattened page, and produce images that
| look like they came from cut pages in a flatbed scanner. Now just
| pay someone well in Ethiopia to carefully turn pages without
| damage.
|
| As any researcher can attest, our digital libraries now hold a
| century of scanned work of questionable quality. AI could infer
| scans indistinguishable from an outline font format original on
| an 8K monitor.
|
| I once helped consult on the 1980's font wars, turning old
| formats and digital scans into Postscript and TrueType fonts.
| This was hard then, but will soon be understood as the "correct"
| way to scan text, when software catches up.
|
| For the scientific literature, we need a ChatGPT equivalent to
| reconstruct LaTeX source that can reproduce each page. (We really
| need a successor to LaTeX that isn't such an arcane language, and
| can author fixed and flowable text with equal ease.)
| aragonite wrote:
| > As any researcher can attest, our digital libraries now hold
| a century of scanned work of questionable quality
|
| I keep thinking so much collaborative potential is not being
| utilized. Imagine each (unique) Google Book is basically an
| editable wiki where people can directly correct OCR errors as
| they come across them (with an associated Talk page where they
| can give explanations, etc)
| ajuc wrote:
| Software was there in 2017 already. I've worked on a device for
| blind people that did basically this (except there's no need
| for a stereo image).
|
| Here you can see how flattening works (this was handled by the
| library, we didn't need to do any custom code):
|
| https://youtu.be/DPu0iJtK2sI?t=1542
|
| There's also a feature where it tells you to turn the page,
| detects that it has been turned, takes a photo, etc. And in the
| background it flattens, splits into pages and OCRs the photos.
| With a little practice you can scan and OCR a whole book at 1-5
| seconds per page.
|
| https://youtu.be/DPu0iJtK2sI?t=1909
|
| Then it saves the OCRed book and it can read it to you whenever
| you like.
| ancientworldnow wrote:
| Software absolutely already has page flattening capability.
| Labor is not as trivial as you make it sound.
| fmajid wrote:
| The Fujitsu ScanSnap SV600 has that, as do some Czur book
| scanners. Reliably turning pages without damaging the book is
| the tricky bit.
| ajuc wrote:
| Most OCR libraries have that feature AIAK.
| nmca wrote:
| This comment is silly because of course you need to turn the
| pages and at volume paying people to do so is prohibitive.
|
| On the software side though, progress marches on:
| https://facebookresearch.github.io/nougat/ is downloadable and
| great.
| Hamcha wrote:
| How is setting up infrastructure to exploit third world labor a
| "software problem" exactly?
|
| I think the problem isn't that software can't do de-warping
| well, it's that by the time you set up everything for book
| scanning you might as well use a setup that doesn't need it.
| 13415 wrote:
| You've also got to wonder about idea behind it that it's
| supposed to be trivial to send a large number of books to
| Ethiopia and back without damaging them.
| dredmorbius wrote:
| Shipping _is_ cheap, generally.
|
| Even for relatively high-mass objects such as books. Slow
| boats are slow but exceedingly efficient.
|
| The main risk would likely be container loss off a ship.
| Possibly environmental damage if spending much time in warm
| humid climates.
| Syzygies wrote:
| The source article sends these devices to scan books in
| Ethiopia, where there are people looking for work.
| magic_hamster wrote:
| This is very much a physical problem and there aren't too many
| shortcuts you can take.
|
| I've been part of a preservation project and scanned a LOT of
| magazines. Generally, it doesn't matter much if you place it on
| a flat bed or not. Flipping the page manually takes a while
| either way.
|
| There's already a known way for scanning magazines and books
| very fast: cut the spine and feed the pages to an automatic
| scanner. This is of course not applicable to anything you'd
| like to keep around after scanning, because your copy is
| destroyed.
|
| All in all, the best way to automate scanning without
| destroying the item, will have to combine a top level camera
| with a machine to turn pages. I believe this is what was going
| on in Google's massive scanning project.
|
| Maybe using x-ray could work for "scanning" some books without
| having to turn the pages. But I suppose there'll be a new set
| of problems to solve there.
| asdefghyk wrote:
| RE "....There's already a known way for scanning magazines
| and books very fast: cut the spine and feed the pages to an
| automatic scanner. This is of course not applicable to
| anything you'd like to keep around after scanning, because
| your copy is destroyed......" I've always thought the pages
| could be rebound. Not perfect but a halfway solution.
| distract8901 wrote:
| I recently saw some work from the University of Kentucky on
| reading the Herculaneum scrolls. These scrolls were
| carbonized by volcanic activity (Pompeii?) and obviously
| can't be unrolled without disintegrating. They used some
| interesting CT (xray) scanning plus machine learning to
| distinguish the carbon-based ink from the mostly carbon
| substrate and retrieve legible text.
|
| Of course, that only gets you the printed text. You might
| lose notes and doodles in the margins, or other physical
| evidence. But, it's certainly promising for works that are
| too delicate to physically open and inspect
| acqq wrote:
| Printed text on _scrolls_?
| amelius wrote:
| > With two or more photos or a stereo image (new iPhone?) one
| could triangulate to infer a flattened page
|
| Especially if you project a grid over the pages using e.g. a
| scanning laser.
| tgw43279w wrote:
| Regarding your point about a successor to LaTeX:
| https://typst.app/ is turning out to be great.
| hinnisdael wrote:
| Aren't you losing information in the parts that aren't
| perfectly straight? Yes, you can stretch those to recreate the
| original layout, but that would come at the cost of resolution
| in the interpolated sections of the page. Granted, not a
| problem for most books, but probably a reason prople are still
| looking for mechanical solutions to the problem.
| erikpukinskis wrote:
| I think the suggestion is that with AI you can interpolate to
| the actual letterforms, not to pixels.
|
| Working a typical volume the letter "e" will appear hundreds
| of times and be identical, so there should be lots of data to
| help resolve ambiguities in the poorer parts of images.
|
| Not to mention data that can be used across volumes.
| xyzzy_plugh wrote:
| If the goal is to ultimately OCR then it's moot. But yes, of
| course information is lost.
|
| That being said, modern phone cameras are going to produce
| "scans" above 300 DPI, and while 600 DPI or higher might be
| tricky they're stills possible if you take partial shots of a
| document, assuming you can focus that close.
|
| What you lose in quality you make up with convenience, I
| suppose.
| userbinator wrote:
| No. Absolutely NO to AI. The last thing we need is for text to
| be changed in subtle ways by the digitisation process.
|
| https://news.ycombinator.com/item?id=29223815
|
| That risk of "plausible but incorrect" has been a concern even
| before AI.
| [deleted]
| chaxor wrote:
| This looks absolutely fantastic. Not only can you digitize your
| books, but it also shreds them for you for free!
|
| Pretty sweet 2 for 1 deal.
| the_arun wrote:
| Just curious - Once we scan, we have all contents in digitized
| format. So, why unbinding a book to pages before scanning is not
| a scalable model? Is this to avoid additional work of unbinding?
| qingcharles wrote:
| Unless the book is super valuable then it is often easier to
| just use a guillotine to slice off the binding completely and
| feed the pages through a sheet-feed scanner.
| distract8901 wrote:
| Archivists typically don't like destroying a work when
| preserving it. Scanning a book also only gives you an optical
| image. In some cases, like medieval manuscripts, the pages may
| have been erased and written over. If we simply scan the book
| and destroy the physical copy, we've lost that evidence.
|
| But again, for cheap mass-market works, most archivists
| probably won't care about destroying one copy out of a million
| to preserve the work. It's really only a problem for very old
| and very rare works
| quijoteuniv wrote:
| This is cool, easily scan my books at home that i will use to
| train my own LLM! Super me!
| ramraj07 wrote:
| This seems like it'll shred any book that's even slightly
| damaged..
| mackwell wrote:
| It definitely would. And probably some that are not damaged
| when something gets jammed or the environmental conditions
| change and react with the paper. But if you've got a mountain
| of books to scan and you can do the bulk of them with a machine
| like this and then use a more careful approach for the special
| cases, that's a win.
| phoronixrly wrote:
| I also cringed at the look of this. I have a bunch of old books
| I would like to digitize, and most bookscanners I come across
| require you to at least spread open the book completely, which
| would, as you say shred them... And they don't even involve
| vacuum, movement across sharp edges, and stepper motors...
|
| Oooh.. Second question in https://linearbookscanner.org/faq/...
|
| > Out of 50 books tested, 45% had one or two of their pages
| either torn or folded
|
| I would not use this even on my less valuable books.
| stavros wrote:
| Also, the phrase "out of 50 books tested, 45% had" sounds
| like you want someone to mistake that for 90%.
| [deleted]
| sandreas wrote:
| Hehe nice. There is a whole community about this topic at:
| https://diybookscanner.org/
|
| Years ago I once wrote a little tool in Java called bookbuilder,
| where you could turn the pages manually, make a photo and then
| run an automatic process on all images to build a searchable pdf.
|
| I used https://boofcv.org/, an impressive Computer Vision library
| in pure Java, still exists and it is pretty fast, too.
|
| It was able to detect the page contour, deskew it, flatten the
| image and remove finger contours by matching the skin tone, then
| build a PDF with integrated invisible OCR Layer without any user
| interaction. I remember that I was working on line slope
| detection with some kind of watershed algorithm to improve the
| flattening part.
|
| Fun project, I wonder if I have the source code laying around
| somewhere... even the download page is gone today. This was long
| before I went open source with all of my little side projects,
| because I never thought it could be interesting for someone else
| :-)
| metadat wrote:
| BoofCV is incredibly cool, thanks for sharing! It's unfortunate
| I didn't know about it before today, I would've definitely
| invested time to learn and hopefully contribute back (there's
| still a chance, but at this point it seems to cover everything
| I've ever heard of, and more).
|
| I just spent 30 minutes clicking through and inspecting every
| example.
| sandreas wrote:
| Check the android test app. Very cool stuff to see there.
| pontifier wrote:
| I've been interested in book scanning for a while. I think I
| remember seeing your software! I built one of the early
| versions that I saw on the site, and then happened to meet
| Jonathon Duerig and do some waterjet cutting for him a few
| years ago.
|
| When I saw the linear bookscanner the first time, I realized it
| was upside down. If it were suspended, and able to move itself,
| then all the issues about not knowing the mass of the book, and
| dealing with friction could be avoided. Counterweights could
| keep the force constant, and it would be moving a known mass
| when scanning any book.
| WalterBright wrote:
| Please make it available on github!
| sandreas wrote:
| Well, if I find the time I'll try to find the sourcecode, but
| I'm not sure in which state it is :-)
| asdefghyk wrote:
| RE ....There is a whole community about this topic at:
| https://diybookscanner.org/..... The actual f(book scanner )
| forum is down at present. It has been down for a few weeks . We
| are working to get it back up.
___________________________________________________________________
(page generated 2023-09-17 23:00 UTC)