[HN Gopher] Linear Book Scanner - Open-source automatic book sca...
       ___________________________________________________________________
        
       Linear Book Scanner - Open-source automatic book scanner (2014)
        
       Author : gorenb
       Score  : 288 points
       Date   : 2023-09-17 13:03 UTC (9 hours ago)
        
 (HTM) web link (linearbookscanner.org)
 (TXT) w3m dump (linearbookscanner.org)
        
       | hinnisdael wrote:
       | There's a somewhat similar commercial scanner [1] [2], with a
       | V-design as well but inverted to scan from the top. Much gentler
       | on the books as it's the scanner that moves, not the books
       | themselves. Super happy to see someone develop an open-source
       | alternative!
       | 
       | [1] https://www.treventus.com [2]
       | https://youtu.be/SdipuAuWsEs?si=dFWRtva5gO2oM91o
        
         | corndoge wrote:
         | Which is discussed in the Google talk associated with the OP
         | project here:
         | 
         | https://youtu.be/4JuoOaL11bw?si=do1qet5Kq_WErQgz&t=162
        
         | webnrrd2k wrote:
         | Um,I'm not sure this is a great alternative, as this scanner
         | chops out a page for every scan. After that, I don't think it
         | really matters how gentle it is...
         | 
         | Unless you're concerned with binding the pages to form a new
         | book. I think that would be possible with the leftovers.
        
           | codetrotter wrote:
           | I think you are mistaken. Neither of the machines appear to
           | chop pages out.
        
             | webnrrd2k wrote:
             | You're right! My mistake. It looked like there was a sort
             | of deli-slicer blade and suction to remove pages, but,
             | looking again, it's not what's happening. That's what I get
             | for posting pre morning coffee...
        
         | codetrotter wrote:
         | It's probably expensive even to lease/rent, or to use their
         | digitization service. Why? Because they have no pricing info
         | available for those two options either. Just an inquiry form to
         | fill out.
        
           | hinnisdael wrote:
           | Right, no public pricing usually means rates far above what
           | personal or low-budget projects can afford.
           | 
           | Not sure how much of the design is protected and how much
           | inspiration one can take for a non-commercial DIY project
           | like the one presented by OP.
        
             | pontifier wrote:
             | I remember contacting them about 10 years ago... Can
             | confirm it was way out of my price range.
        
         | fuzzythinker wrote:
         | A DIY of the same idea, as linked by others:
         | https://www.diybookscanner.org
        
       | Nzen wrote:
       | tl;dr a diy system for scanning books. Basically, build a
       | triangular prism with a special zig zag slit for a single page to
       | snake through. Put a vacuum on each side to pull the paper
       | through the slit. Put two optical scanners in the middle of the
       | slit, to scan both sides of the sheet as it hangs down. Attach a
       | motor to move a sled pushes just the top, then bottom, of the
       | book. The site features six designs. Some include videos of
       | operation [0].
       | 
       | [0] https://www.youtube.com/watch?v=84byulcC6i4 30 seconds long
       | 
       | In the fullness of time, maybe I would make one of these, given
       | that I live in an apartment and not a house with space to
       | construct/store this scanner. I check etsy every couple of years
       | and haven't seen someone offer a kit. I use 1dollarscan, though
       | they've had to restrict their offering as Pearson, et al notice
       | their existence.
        
       | tohnjitor wrote:
       | Fantastic concept! I agree with the concerns about damaged pages.
       | Perhaps this is something that could be easily improved.
        
       | billy_bitchtits wrote:
       | just saw the binding off and push it through a scansnap or the
       | like
        
       | rychco wrote:
       | I love the idea, although the risk of torn pages is mildly
       | concerning for archival purposes or valuable books. Though if
       | that were the case, I'm sure scanning by hand would be preferred
       | anyway. I've often wanted a device like this for the purpose of
       | digitizing my excessively large collection of books.
       | 
       | Regarding frequency of torn pages in the FAQ:
       | 
       | > Prototype 1 could scan the majority of books without damage,
       | but may tear one or two pages in some books. Out of 50 books
       | tested, 45% had one or two of their pages either torn or folded.
       | This is a very early prototype and there are many areas for
       | improvement in the design.
       | 
       | In my opinion, this is mostly acceptable. Especially if a future
       | revision reduces the 45% to somewhere around the ~10-20% range.
       | If I had the space for a device like this, I would definitely
       | consider building one.
        
         | ChristianGeek wrote:
         | If you haven't already seen this site, it's well worth a visit:
         | 
         | https://www.diybookscanner.org/en/intro.html
        
           | mcshicks wrote:
           | I made the cardboard scanner many years ago both to scan
           | whole books as well as sections of books from the library. It
           | worked pretty well but nowadays in practice I usually look
           | into the open library at the Internet archive first. Hope it
           | sticks around.
        
           | rychco wrote:
           | Great link, thanks!
        
         | gcr wrote:
         | For a while, the Internet Archive built a book scanner that
         | rests the book in a V-shaped cradle. A volunteer turns pages by
         | hand and lowers/raises a pair of glass panes that gently press
         | the pages for imaging by a pair of DSLR cameras on an angled
         | mount. The whole assembly isn't automatic, but can be easily
         | operated by hand.
         | 
         | https://blog.archive.org/2021/02/09/meet-eliza-zhang-book-sc...
        
           | pimlottc wrote:
           | You said "for a while"; are they not using these machines
           | anymore?
        
             | GuestHNUser wrote:
             | I imagine lawsuits like this[0] caused them to slow down
             | unfortunately.
             | 
             | [0] https://en.m.wikipedia.org/wiki/Authors_Guild,_Inc._v._
             | Googl....
        
           | userbinator wrote:
           | I'm not sure if one of those machines was the cause, but I've
           | seen far too many old books on archive.org which have pages
           | that appear to have been torn by the scanner; thus I doubt
           | they're manually doing it.
        
           | asdefghyk wrote:
           | Interestingly the Internet Archive copied the open source
           | design (of their scanner) from the site
           | https://diybookscanner.org/ ( as they are allowed to by its
           | open source licence ) The internet Archive then effectively
           | refused to release any details back to the community. After a
           | lot of "pushing", the Internet Archive did acknowledge the
           | source their design was based on came from the site
           | bookscanner.org. This would have been very disappointing, in
           | my opinion for the designer of the scanner - who released the
           | info open source. At one time Internet Archive sold this
           | scanner to organizations for $10K I think the price has
           | dropped now ( I think ) to a few thousand.
        
       | zoklet-enjoyer wrote:
       | This looks a lot safer
       | 
       | https://www.inforum.com/newsmd/ndsu-students-book-scanner-in...
       | 
       | http://diybookscanner.org
        
         | FinnKuhn wrote:
         | While they are certainly safer they aren't automatic and
         | require someone to turn the pages, which this project attempts
         | to solve, although it certainly should be done in a more gently
         | manner as to not damage the books.
        
       | Syzygies wrote:
       | This is a problem domain where software hasn't caught up with
       | what is possible, so people do in hardware what could be done in
       | software.
       | 
       | With two or more photos or a stereo image (new iPhone?) one could
       | triangulate to infer a flattened page, and produce images that
       | look like they came from cut pages in a flatbed scanner. Now just
       | pay someone well in Ethiopia to carefully turn pages without
       | damage.
       | 
       | As any researcher can attest, our digital libraries now hold a
       | century of scanned work of questionable quality. AI could infer
       | scans indistinguishable from an outline font format original on
       | an 8K monitor.
       | 
       | I once helped consult on the 1980's font wars, turning old
       | formats and digital scans into Postscript and TrueType fonts.
       | This was hard then, but will soon be understood as the "correct"
       | way to scan text, when software catches up.
       | 
       | For the scientific literature, we need a ChatGPT equivalent to
       | reconstruct LaTeX source that can reproduce each page. (We really
       | need a successor to LaTeX that isn't such an arcane language, and
       | can author fixed and flowable text with equal ease.)
        
         | aragonite wrote:
         | > As any researcher can attest, our digital libraries now hold
         | a century of scanned work of questionable quality
         | 
         | I keep thinking so much collaborative potential is not being
         | utilized. Imagine each (unique) Google Book is basically an
         | editable wiki where people can directly correct OCR errors as
         | they come across them (with an associated Talk page where they
         | can give explanations, etc)
        
         | ajuc wrote:
         | Software was there in 2017 already. I've worked on a device for
         | blind people that did basically this (except there's no need
         | for a stereo image).
         | 
         | Here you can see how flattening works (this was handled by the
         | library, we didn't need to do any custom code):
         | 
         | https://youtu.be/DPu0iJtK2sI?t=1542
         | 
         | There's also a feature where it tells you to turn the page,
         | detects that it has been turned, takes a photo, etc. And in the
         | background it flattens, splits into pages and OCRs the photos.
         | With a little practice you can scan and OCR a whole book at 1-5
         | seconds per page.
         | 
         | https://youtu.be/DPu0iJtK2sI?t=1909
         | 
         | Then it saves the OCRed book and it can read it to you whenever
         | you like.
        
         | ancientworldnow wrote:
         | Software absolutely already has page flattening capability.
         | Labor is not as trivial as you make it sound.
        
           | fmajid wrote:
           | The Fujitsu ScanSnap SV600 has that, as do some Czur book
           | scanners. Reliably turning pages without damaging the book is
           | the tricky bit.
        
             | ajuc wrote:
             | Most OCR libraries have that feature AIAK.
        
         | nmca wrote:
         | This comment is silly because of course you need to turn the
         | pages and at volume paying people to do so is prohibitive.
         | 
         | On the software side though, progress marches on:
         | https://facebookresearch.github.io/nougat/ is downloadable and
         | great.
        
         | Hamcha wrote:
         | How is setting up infrastructure to exploit third world labor a
         | "software problem" exactly?
         | 
         | I think the problem isn't that software can't do de-warping
         | well, it's that by the time you set up everything for book
         | scanning you might as well use a setup that doesn't need it.
        
           | 13415 wrote:
           | You've also got to wonder about idea behind it that it's
           | supposed to be trivial to send a large number of books to
           | Ethiopia and back without damaging them.
        
             | dredmorbius wrote:
             | Shipping _is_ cheap, generally.
             | 
             | Even for relatively high-mass objects such as books. Slow
             | boats are slow but exceedingly efficient.
             | 
             | The main risk would likely be container loss off a ship.
             | Possibly environmental damage if spending much time in warm
             | humid climates.
        
             | Syzygies wrote:
             | The source article sends these devices to scan books in
             | Ethiopia, where there are people looking for work.
        
         | magic_hamster wrote:
         | This is very much a physical problem and there aren't too many
         | shortcuts you can take.
         | 
         | I've been part of a preservation project and scanned a LOT of
         | magazines. Generally, it doesn't matter much if you place it on
         | a flat bed or not. Flipping the page manually takes a while
         | either way.
         | 
         | There's already a known way for scanning magazines and books
         | very fast: cut the spine and feed the pages to an automatic
         | scanner. This is of course not applicable to anything you'd
         | like to keep around after scanning, because your copy is
         | destroyed.
         | 
         | All in all, the best way to automate scanning without
         | destroying the item, will have to combine a top level camera
         | with a machine to turn pages. I believe this is what was going
         | on in Google's massive scanning project.
         | 
         | Maybe using x-ray could work for "scanning" some books without
         | having to turn the pages. But I suppose there'll be a new set
         | of problems to solve there.
        
           | asdefghyk wrote:
           | RE "....There's already a known way for scanning magazines
           | and books very fast: cut the spine and feed the pages to an
           | automatic scanner. This is of course not applicable to
           | anything you'd like to keep around after scanning, because
           | your copy is destroyed......" I've always thought the pages
           | could be rebound. Not perfect but a halfway solution.
        
           | distract8901 wrote:
           | I recently saw some work from the University of Kentucky on
           | reading the Herculaneum scrolls. These scrolls were
           | carbonized by volcanic activity (Pompeii?) and obviously
           | can't be unrolled without disintegrating. They used some
           | interesting CT (xray) scanning plus machine learning to
           | distinguish the carbon-based ink from the mostly carbon
           | substrate and retrieve legible text.
           | 
           | Of course, that only gets you the printed text. You might
           | lose notes and doodles in the margins, or other physical
           | evidence. But, it's certainly promising for works that are
           | too delicate to physically open and inspect
        
             | acqq wrote:
             | Printed text on _scrolls_?
        
         | amelius wrote:
         | > With two or more photos or a stereo image (new iPhone?) one
         | could triangulate to infer a flattened page
         | 
         | Especially if you project a grid over the pages using e.g. a
         | scanning laser.
        
         | tgw43279w wrote:
         | Regarding your point about a successor to LaTeX:
         | https://typst.app/ is turning out to be great.
        
         | hinnisdael wrote:
         | Aren't you losing information in the parts that aren't
         | perfectly straight? Yes, you can stretch those to recreate the
         | original layout, but that would come at the cost of resolution
         | in the interpolated sections of the page. Granted, not a
         | problem for most books, but probably a reason prople are still
         | looking for mechanical solutions to the problem.
        
           | erikpukinskis wrote:
           | I think the suggestion is that with AI you can interpolate to
           | the actual letterforms, not to pixels.
           | 
           | Working a typical volume the letter "e" will appear hundreds
           | of times and be identical, so there should be lots of data to
           | help resolve ambiguities in the poorer parts of images.
           | 
           | Not to mention data that can be used across volumes.
        
           | xyzzy_plugh wrote:
           | If the goal is to ultimately OCR then it's moot. But yes, of
           | course information is lost.
           | 
           | That being said, modern phone cameras are going to produce
           | "scans" above 300 DPI, and while 600 DPI or higher might be
           | tricky they're stills possible if you take partial shots of a
           | document, assuming you can focus that close.
           | 
           | What you lose in quality you make up with convenience, I
           | suppose.
        
         | userbinator wrote:
         | No. Absolutely NO to AI. The last thing we need is for text to
         | be changed in subtle ways by the digitisation process.
         | 
         | https://news.ycombinator.com/item?id=29223815
         | 
         | That risk of "plausible but incorrect" has been a concern even
         | before AI.
        
       | [deleted]
        
       | chaxor wrote:
       | This looks absolutely fantastic. Not only can you digitize your
       | books, but it also shreds them for you for free!
       | 
       | Pretty sweet 2 for 1 deal.
        
       | the_arun wrote:
       | Just curious - Once we scan, we have all contents in digitized
       | format. So, why unbinding a book to pages before scanning is not
       | a scalable model? Is this to avoid additional work of unbinding?
        
         | qingcharles wrote:
         | Unless the book is super valuable then it is often easier to
         | just use a guillotine to slice off the binding completely and
         | feed the pages through a sheet-feed scanner.
        
         | distract8901 wrote:
         | Archivists typically don't like destroying a work when
         | preserving it. Scanning a book also only gives you an optical
         | image. In some cases, like medieval manuscripts, the pages may
         | have been erased and written over. If we simply scan the book
         | and destroy the physical copy, we've lost that evidence.
         | 
         | But again, for cheap mass-market works, most archivists
         | probably won't care about destroying one copy out of a million
         | to preserve the work. It's really only a problem for very old
         | and very rare works
        
       | quijoteuniv wrote:
       | This is cool, easily scan my books at home that i will use to
       | train my own LLM! Super me!
        
       | ramraj07 wrote:
       | This seems like it'll shred any book that's even slightly
       | damaged..
        
         | mackwell wrote:
         | It definitely would. And probably some that are not damaged
         | when something gets jammed or the environmental conditions
         | change and react with the paper. But if you've got a mountain
         | of books to scan and you can do the bulk of them with a machine
         | like this and then use a more careful approach for the special
         | cases, that's a win.
        
         | phoronixrly wrote:
         | I also cringed at the look of this. I have a bunch of old books
         | I would like to digitize, and most bookscanners I come across
         | require you to at least spread open the book completely, which
         | would, as you say shred them... And they don't even involve
         | vacuum, movement across sharp edges, and stepper motors...
         | 
         | Oooh.. Second question in https://linearbookscanner.org/faq/...
         | 
         | > Out of 50 books tested, 45% had one or two of their pages
         | either torn or folded
         | 
         | I would not use this even on my less valuable books.
        
           | stavros wrote:
           | Also, the phrase "out of 50 books tested, 45% had" sounds
           | like you want someone to mistake that for 90%.
        
       | [deleted]
        
       | sandreas wrote:
       | Hehe nice. There is a whole community about this topic at:
       | https://diybookscanner.org/
       | 
       | Years ago I once wrote a little tool in Java called bookbuilder,
       | where you could turn the pages manually, make a photo and then
       | run an automatic process on all images to build a searchable pdf.
       | 
       | I used https://boofcv.org/, an impressive Computer Vision library
       | in pure Java, still exists and it is pretty fast, too.
       | 
       | It was able to detect the page contour, deskew it, flatten the
       | image and remove finger contours by matching the skin tone, then
       | build a PDF with integrated invisible OCR Layer without any user
       | interaction. I remember that I was working on line slope
       | detection with some kind of watershed algorithm to improve the
       | flattening part.
       | 
       | Fun project, I wonder if I have the source code laying around
       | somewhere... even the download page is gone today. This was long
       | before I went open source with all of my little side projects,
       | because I never thought it could be interesting for someone else
       | :-)
        
         | metadat wrote:
         | BoofCV is incredibly cool, thanks for sharing! It's unfortunate
         | I didn't know about it before today, I would've definitely
         | invested time to learn and hopefully contribute back (there's
         | still a chance, but at this point it seems to cover everything
         | I've ever heard of, and more).
         | 
         | I just spent 30 minutes clicking through and inspecting every
         | example.
        
           | sandreas wrote:
           | Check the android test app. Very cool stuff to see there.
        
         | pontifier wrote:
         | I've been interested in book scanning for a while. I think I
         | remember seeing your software! I built one of the early
         | versions that I saw on the site, and then happened to meet
         | Jonathon Duerig and do some waterjet cutting for him a few
         | years ago.
         | 
         | When I saw the linear bookscanner the first time, I realized it
         | was upside down. If it were suspended, and able to move itself,
         | then all the issues about not knowing the mass of the book, and
         | dealing with friction could be avoided. Counterweights could
         | keep the force constant, and it would be moving a known mass
         | when scanning any book.
        
         | WalterBright wrote:
         | Please make it available on github!
        
           | sandreas wrote:
           | Well, if I find the time I'll try to find the sourcecode, but
           | I'm not sure in which state it is :-)
        
         | asdefghyk wrote:
         | RE ....There is a whole community about this topic at:
         | https://diybookscanner.org/..... The actual f(book scanner )
         | forum is down at present. It has been down for a few weeks . We
         | are working to get it back up.
        
       ___________________________________________________________________
       (page generated 2023-09-17 23:00 UTC)