[HN Gopher] Building an Open Source Decentralized E-Book Search ...
___________________________________________________________________
Building an Open Source Decentralized E-Book Search Engine
Author : j2qk3b
Score : 216 points
Date : 2024-03-11 11:56 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| boredumb wrote:
| Many moons ago I wanted to do something similar for AI data sets
| and models over IPFS. I don't know the future for IPFS but I do
| hope the essence of a p2p data sharing infrastructure becomes
| more accessible to help individuals tackle some of the issues
| with large datasets with less hardware on hand.
|
| https://github.com/JakeKalstad/IPFSPytorchDataset
| https://github.com/JakeKalstad/load_ipfs_pytorch_model
| Mortiffer wrote:
| Could you detail how you populate the search index and what you
| expect the memory limits to be?
| MrThoughtful wrote:
| What on earth is this about?
|
| "I was recommended ... Liber3 ..., which uses ENS domain names
| ... running on ENS and IPFS ... they appear to be using Glitter
| ... a ... service built with Tendermint."
|
| This sounds like a signal from outer space to me. In a language
| used in a different galaxy.
|
| I tried that Liber3 thing, but whatever I do, I get "Oops!
| Something went wrong. Please refresh or try again later".
|
| What is this all about?
| WolfeReader wrote:
| The title is the de-jargonized version. It's a set of
| instructions to build an open-source ebook search engine.
| (Admittedly there is still some jargon in that description, but
| not to the level of naming specific libraries.)
|
| The bulk of the article is implementation details, helpfully
| hyperlinked.
| droopyEyelids wrote:
| The title got me really excited that they were doing full text
| search. Boy that would be an awesome project. Zlib and Google
| Books do it, but it would be great to have a open source version
| that everyone could contribute to, and provided access to full
| texts
| raybb wrote:
| OpenLibrary does provide search access to full texts. For
| example:
| https://openlibrary.org/search/inside?q=%22institutional+thi...
|
| It is open source and they're always looking for contributors.
| I think they'd especially welcome help improving search!
|
| https://github.com/internetarchive/openlibrary/
| devops000 wrote:
| Cool! Could be used for torrent searching? Like running web
| torrent with video streaming and a decentralized search engine.
| j2qk3b wrote:
| Yes! Try this one: https://anybt.eth.limo/
|
| I will build an open sourced version too!
| hanniabu wrote:
| Nice to find eth.limo being used in the wild
| throwawayyyyyy2 wrote:
| And then realize it has existed for almost 15 years and it's
| called libgen.rs
| spondylosaurus wrote:
| Anna's Archive is even better!
| brevitea wrote:
| IMO, the more the merrier. That's the joy of decentralization
| and P2P.
| tamimio wrote:
| It seems they are using flask in their code, just to show you
| don't to go crazy with your stack to build useful software.
| ValleZ wrote:
| Is this an actual search engine or just a front end which builds
| "select from" queries?
| carlosjobim wrote:
| There's 13 search engines in a dozen if you only want book title
| or author. What's lacking is a search index of the content of
| e-books. Something that will soon be incredibly important in the
| face of generative AI. Somebody here on HN told me it only takes
| a laptop to index the content of millions of books, while other
| people say the scope is almost impossible.
|
| Is there any project working on this?
| bt1a wrote:
| Perhaps the initial creation of the index is indeed something
| that an average laptop could accomplish, but I'd imagine that
| frequently updating the index and serving requests against it
| would be compute-intensive. I have nothing to back this up but
| speculation. Would love to learn more!
| CWuestefeld wrote:
| I believe that Calibre, the popular and free ebook management
| tool, now supports indexing the content all books in your
| library.
| myco_logic wrote:
| Depends on how beefy that laptop is...
|
| I've been doing some local LLM stuff at work recently, and even
| with the amazing advances in quantization lately, doing that
| kind of stuff on a ThinkPad _is feasible_ , but still strongly
| inferior to just renting out a VPS with a couple 4090/H100s for
| several hours.
|
| The biggest thing with summarizing stuff is that most local LLM
| models often don't have very big context-windows, so they have
| trouble with larger texts like even a short Vonnegut novel (I
| was just testing em' with summarizing GitHub issues, and even
| with a 16k token context window they still sometimes struggle
| if there are a lot of comments).
|
| There are probably smarter people than I who could get this
| working on a Raspberry Pi though... ;)
| dmotz wrote:
| I have a side project that aims to organize your ebook
| highlight collections with on-device semantic search. [1] Right
| now it only indexes your own content but I'd like to add a mode
| that allows you to share your collection and let others find
| relevant ideas via semantic search -- a discovery platform for
| ideas found in books. It's open source if you want a sense of
| how it works now. [2]
|
| [1] https://emdash.ai/
|
| [2] https://github.com/dmotz/emdash
| neilv wrote:
| This seems to be intended for IP piracy. Clarifying that in the
| title would help.
|
| I'm trying to encourage publishers and authors to offer
| legitimate sales of DRM-free ebooks, so would prefer we try not
| to have the term "ebook" associated with piracy.
| RamblingCTO wrote:
| Since when are ebooks piracy? I think that might only be you
| neilv wrote:
| Title is "Building an Open Source Decentralized E-Book Search
| Engine", and screenshot seems to suggest piracy.
| t-3 wrote:
| Nothing about the title suggests piracy, and the screenshot
| doesn't show download links - hell, there aren't any actual
| Harry Potter books in a search for "Harry Potter". Even if
| it _were_ searching for files, free and legal ebooks are
| ubiquitous, no copyright infringement necessary to make it
| a worthwhile endeavor.
| WolfeReader wrote:
| Please be specific about how the screenshot advocates
| piracy.
|
| (Also, a personal preference: never use the phrase "seems
| to suggest" again; if you're going to make an accusation,
| be honest enough to actually make it.)
| sureglymop wrote:
| It's a search engine... What about it makes it specific to IP
| piracy?
|
| I actually understand your point well but I think it's even
| more important not to group in any legitimate use of technology
| with illegitimate use of it. Especially considering recent
| events (lawsuits over Yuzu and Dolphin emulators).
| citruscomputing wrote:
| It does seem to be! Isn't that cool?
___________________________________________________________________
(page generated 2024-03-11 23:00 UTC)