[HN Gopher] Distributed search engines using BitTorrent and SQLite
___________________________________________________________________
Distributed search engines using BitTorrent and SQLite
Author : tosh
Score : 109 points
Date : 2021-01-20 18:40 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| r32a_ wrote:
| I'd checkout http://dazaar.com/ as well. Same ideas but built on
| Hypercore technology and with payments module built in
| asymptosis wrote:
| > This currently only works on Mac OS X.
|
| That is already a sign that this project is not going to go
| anywhere. Anyone who wants to build a server or make this part of
| their seedbox is going to use Linux or one of the BSDs.
| bigdict wrote:
| macOS is POSIX certified. Is Linux?
| bawolff wrote:
| I think that is fine for a proof of concept.
|
| The bigger reasonis, unless im missing something, this is not
| distributed in the sense most people use the term "distributed"
| in the context of search engines, so its not as interesting as
| everyone is making it out to be.
| lxe wrote:
| > Site users then start downloading the site torrent, but, rather
| than downloading pieces of the torrent in "rarest first" order,
| they download pieces based on the search query they performed.
|
| Interesting. How does the system know where the result of the
| query might appear in the file?
| frafra wrote:
| Interesting question. I looked at the source code to understand
| that.
|
| SQLite knows where to look for when you open a SQLite database
| and you run a query, right? It just asks the underlying
| filesystem to provide N bytes starting from an offset using a C
| function, then it repeats the same operation on different
| portions of the file, it does its computation and everybody is
| happy.
|
| The software relies on sqltorrent, which is a custom VFS for
| SQLite. That means that SQLite function to read data from a
| file stored in the filesystem is replaced by a custom function.
| Such custom code computes which Torrent block(s) should have
| the highest priority, by dividing the offset and the number of
| bytes that SQLite wants to read by the size of the torrent
| blocks. It is just a division.
|
| See:
| https://github.com/bittorrent/sqltorrent/blob/master/sqltorr...
| miki123211 wrote:
| This is not as distributed as you might believe.
|
| The content itself is distributed, which creates privacy
| challenges of its own, but control over that content is
| centralized. If we want automatic updates of the index, we're
| still relying on a single party to provide them. That single
| party might respond to DMCAs, remove/censor content etc.
| jpereira wrote:
| For work in a similar vein, Mikeal Rogers has recently been
| working on IPSQL[0] based on peer-to-peer prdered search
| indexes[1] built on IPFS, which shares the content-addressed
| nature of BitTorrent.
|
| [0]: https://github.com/mikeal/IPSQL
|
| [1]: https://0fps.net/2020/12/19/peer-to-peer-ordered-search-
| inde...
| adkadskhj wrote:
| With respect to IPFS and Merkle Search Trees, can anyone "in
| the know" comment on how they're materially different than
| Probabilistic B-Trees as defined by Noms[1] and Dolt[2]? I've
| been playing a lot with the Noms variant (Prolly Trees) lately
| and have often wondered where they differ from IPFS-ish Merkle
| Search Trees. If at all.
|
| [1]: https://github.com/attic-
| labs/noms/blob/master/doc/intro.md#... [2]:
| https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-tabl...
| bawolff wrote:
| I don't think this meets most people's definition of a
| distributed search engine.
| tanelpoder wrote:
| Since SQLite executing SQL locally on a remote peer machine is
| essentially computation push-down, one could think of building a
| planet-scale distributed analytics engine using such a pattern
| (perhaps using DuckDB and parquet/arrow files under the hood -
| but which exact SQL engine is behind the query pushdown API can
| be abstracted away too)
|
| edit: attracted->abstracted
___________________________________________________________________
(page generated 2021-01-20 23:00 UTC)