[HN Gopher] Show HN: TeaTime - distributed book library powered ...
___________________________________________________________________
Show HN: TeaTime - distributed book library powered by SQLite, IPFS
and GitHub
Recently there seem to be a surge in SQLite related projects.
TeaTime is riding that wave... A couple of years ago I was
intrigued by phiresky's post[0] about querying SQLite over HTTP. It
made me think that if anyone can publish a database using GitHub
Pages, I could probably build a frontend in which users can decide
which database to query. TeaTime is like that - when you first
visit it, you'll need to choose your database. Everyone can create
additional databases[1]. TeaTime then queries it, and fetches files
using an IPFS gateway (I'm looking into using Helia so that users
are also contributing nodes in the network). Files are then
rendered in the website itself. Everything is done in the browser -
no users, no cookies, no tracking. LocalStorage and IndexedDB are
used for saving your last readings, and your position in each file.
Since TeaTime is a static site, it's super easy (and free) to
deploy. GitHub repo tags are used for maintaining a list of public
instances[2]. Note that a GitHub repository isn't mandatory for
storing the SQLite files or the front end - it's only for the
configuration file (config.json) of each database, and for listing
instances. Both the instances themselves and the database files can
be hosted on Netlify, Cloudflare Pages, your Raspberry Pi, or any
other server that can host static files. I'm curious to see what
other kinds of databases people can create, and what other types of
files TeaTime could be used for. [0]
https://news.ycombinator.com/item?id=27016630 [1]
https://github.com/bjesus/teatime-json-database/ [2]
https://github.com/bjesus/teatime/wiki/Creating-a-TeaTime-in...
Author : yoavm
Score : 284 points
Date : 2024-11-27 13:56 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| viborcip wrote:
| "Everything is done in the browser - no users, no cookies, no
| tracking. LocalStorage and IndexedDB are used for saving your
| last readings, your position in each file."
|
| I love this! Thanks for making this!
| noamikotamir wrote:
| super cool
| mdaniel wrote:
| > (I'm looking into using Helia so that users are also
| contributing nodes in the network)
|
| I had to look that term up <https://github.com/ipfs/helia#readme>
| but while sniffing around in their
| <https://github.com/ipfs/helia/wiki/Projects-using-Helia> I was
| reminded of https://github.com/orbitdb/orbitdb#readme which seems
| like it may be much less rolling your own parts
| yoavm wrote:
| Thanks, I haven't heard of it before. I wonder if it's possible
| to use OrbitDB to download a file from IPFS though? Based on
| it's CID, I mean, because that was my intention with Helia. I
| thought that one of the nice thing about reading books is that
| it takes time, and if users could be seeding their files from
| IndexedDB to the network, they could automatically and
| effortlessly become "contributing citizens".
|
| Another interesting use case would be to see if this can
| replace (or be an addition to) SQLite as the database in which
| the queries are ran.
| zwnow wrote:
| Seeding sounds like what's done by torrenting too no? How do
| you prevent people from distributing content they do not own?
| yoavm wrote:
| Yes it is, and you're absolutely right with raising this
| concern. I was planning to make seeding (if I get it
| working at all) an opt-in feature, firstly to prevent
| excessive data usage on mobile connections.
| slagfart wrote:
| Sorry if I missed this, but is there an example instance we can
| play with?
| yoavm wrote:
| Yes, https://bjesus.github.io/teatime/
| westurner wrote:
| Do static sites built with sphinx-build or _jupyter-book_ or hugo
| or other jamstack static site generators work with TeaTime?
|
| sphinx-build: https://www.sphinx-doc.org/en/master/man/sphinx-
| build.html
|
| There may need to be a different Builder or an extension of
| sphinxcontrib.serializinghtml.JSONHTMLBuilder which serializes a
| doctree (basically a DOM document object model) to the output
| representation: https://www.sphinx-
| doc.org/en/master/usage/builders/#sphinxc...
|
| datasette and datasette-lite can load CSV, JSON, Parquet, and
| SQLite databases; supports Full-Text Search; and supports search
| Faceting. datasette-lite is a WASM build of datasette with the
| pyodide python distribution.
|
| datasette-lite > Loading SQLite databases:
| https://github.com/simonw/datasette-lite#loading-sqlite-data...
|
| jupyter-lite is a WASM build of jupyter which also supports
| sqlite in notebooks in the browser with `import sqlite3` with the
| python kernel and also with a sqlite kernel:
| https://jupyter.org/try-jupyter/lab/
|
| jupyterlite/xeus-sqlite-kernel:
| https://github.com/jupyterlite/xeus-sqlite-kernel
|
| (edit)
|
| xeus-sqlite-kernel > "Loading SQLite databases from a remote URL"
| https://github.com/jupyterlite/xeus-sqlite-kernel/issues/6#i...
|
| %FETCH <url> <filename> https://github.com/jupyter-xeus/xeus-
| sqlite/blob/ce5a598bdab...
|
| xlite.cpp > void fetch(const std::string url, const std::string
| filename) https://github.com/jupyter-xeus/xeus-
| sqlite/blob/main/src/xl...
| yoavm wrote:
| > Do static sites built with sphinx-build or jupyter-book or
| hugo or other jamstack static site generators work with
| TeaTime?
|
| I guess it depends on what you mean by "work with TeaTime".
| TeaTime itself is a static site, generated using Nuxt. Nothing
| that it does cannot be achieved with another stack - at the end
| it's just HTML, CSS and JS. I haven't tried sphinx-build or
| jupyter-book, but there isn't a technical reason why Hugo
| wouldn't be able to build a TeaTime like website, using the
| same databases.
|
| > datasette-lite > Loading SQLite databases:
| https://github.com/simonw/datasette-lite#loading-sqlite-data...
|
| I haven't seen datasette before. What are the biggest benefits
| you think it has over sql.js-httpvfs (which I'm using now)? Is
| it about the ability to also use other formats, in addition to
| SQLite? I got the impression that sql.js-httpvfs was a bit more
| of a POC, and later some possibly better solutions came out,
| but I haven't really went that rabbit hole to figure out which
| one would be best.
|
| Edit: looking a little more into datasette-lite, it seems like
| one of the nice benefits of sql.js-httpvfs is that it doesn't
| download the whole SQLite database in order to query it. This
| makes it possible have a 2GB database but still read it in
| chunks, skipping around efficiently until you find your data.
| westurner wrote:
| So the IPFS part is instead of a CDN? Does it replace asset
| URLs in templates with e.g. cf IPFS gateway URLs with SRI
| hashes?
|
| datasette-lite > "Could this use the SQLite range header
| trick?" https://github.com/simonw/datasette-lite/issues/28
|
| From xeus-sqlite-kernel > "Loading SQLite databases from a
| remote URL" https://github.com/jupyterlite/xeus-sqlite-
| kernel/issues/6#i... re: "Loading partial SQLite databases
| over HTTP":
|
| > _sql.js-httpvfs:https://github.com/phiresky/sql.js-httpvfs_
|
| > _sqlite-wasm-http:https://github.com/mmomtchev/sqlite-wasm-
| http_
|
| >> _This project is inspired from @phiresky /sql.js-httpvfs
| but uses the new official SQLite WASM distribution._
|
| Datasette creates a JSON API from a SQLite database, has an
| optional SQL query editor with canned queries, multi DB query
| support, docs; https://docs.datasette.io/en/stable/sql_querie
| s.html#cross-d... :
|
| > _SQLite has the ability to run queries that join across
| multiple databases. Up to ten databases can be attached to a
| single SQLite connection and queried together._
| yoavm wrote:
| The IPFS part is where the actual files are downloaded
| from. I wouldn't necessarily say that it's instead of a CDN
| - it just seems to be a reliable enough way of getting the
| files (without the pain of setting up a server with all
| these files myself). Indeed, it fetches them through IPFS
| Gateways (the Cloudflare IPFS Gateway is unfortunately no
| longer a thing) using their CID hash.
| mushufasa wrote:
| > The databases used in TeaTime are GitHub repositories tagged
| with the teatime-database topic, which are published on GitHub
| Pages.
|
| Couldn't this be a security issue, for a bad actors to use this
| tag?
| yoavm wrote:
| That's a fair point - I guess it could be abused. Databases are
| sorted by their number of GitHub stars, so I was hoping that
| with the power of the crowds it will be possible to minimize
| the bad effect such actor might have, by simply not voting them
| up.
| mushufasa wrote:
| there's been several attacks recently where a bad actor takes
| over a repo where the original maintainer wants to take a
| step back, then launch a supply chain attack. in recent
| cases, the attack came from obfuscated binary files in the
| repo rather than code. given we are dealing with documents
| here (books) that would be easy to hide malicious code in a
| file. pdfs have all sorts of execution vulnerabilities for
| example
| cicko wrote:
| PDFs are not executables.
| taneq wrote:
| You'd be surprised what's executable with the right
| attitude.
| crest wrote:
| May I recommend the old 27C3 talk "OMG WTF PDF"?
| samatman wrote:
| Other than the JavaScript, you mean.
|
| https://cloudmersive.com/article/Understanding-Embedded-
| Java...
| yoavm wrote:
| Interesting - I'm kinda counting on PDF.js, which is used
| for PDF rendering, on doing it safely, but of course that
| doesn't always have to be the case. Do you have any
| thoughts on how to make this safer?
| mushufasa wrote:
| some other method of collection where you can hav eknown
| trust of the files contributed, some method of
| 'registering' a submission to create trust,
| clueless wrote:
| Is this like an open source distributed libgen?
| yoavm wrote:
| Libgen is one of the databases that are currently available
| (and was probably a good fit because they already had their
| files on IPFS), but I think that this architecture, in which
| the UI is decoupled from the database, the database doesn't
| hold any copyrighted materials, and the files are downloaded
| from IPFS, is quite resilient and could be used for serving all
| sorts of content.
| evantbyrne wrote:
| Any particular reason for choosing IPFS instead of bittorrent or
| other p2p protocols? It feels like every time I try an IPFS tool
| it just crashes, whereas I rarely have issues with torrents.
| yoavm wrote:
| Yeah - the desire to have TeaTime run as a normal website.
| BitTorrent doesn't run over HTTP, unless you use WebTorrent,
| which most BitTorrent clients aren't 100% compatible with. This
| means you can basically only download from other WebTorrent
| nodes - and there aren't many.
|
| I do think this might change with the introduction of things
| like the Direct Sockets API, but for now they are too
| restricted and not widely supported.
|
| It's my first time working with IPFS and I agree it hasn't
| always been 100% reliable, but I do hope that if I manage to
| get TeaTime users to also be contributing nodes, this might
| actually improve the reliability of the whole network. Once
| it's possible to use BitTorrent in the browser, I do think it
| would be a great addition
| (https://github.com/bjesus/teatime/issues/3).
| 2color wrote:
| IPFS contributor here.
|
| IPFS certainly has its flaws, but it's been getting much better
| in the last year. If you want to do P2P retrieval in browsers,
| it's practically the most advanced from all protocols.
|
| We just published an update on this topic
| https://blog.ipfs.tech/2024-shipyard-improving-ipfs-on-the-w...
| boramalper wrote:
| > If you want to do P2P retrieval in browsers
|
| My understanding is that IPFS always requires an IPFS - HTTP
| gateway to work in browsers, so I wonder what you mean by
| "P2P retrival in browsers".
|
| _Verified Fetch_ is about verifying content retrieved from
| gateways on client-side but the retrival is still very much
| client-server (vs P2P) today, and _Service Worker Gateway_
| seems to be relying on gateways too ( <trustless-
| gateway.link> in the demo linked).
|
| Based on _IPFS & HTTP: The Blueprint for Radical
| Interoperability - Lidel_ [0], I believe the situation is
| acknowledged by IPFS devs too. In my unsolicited opinion,
| using a custom protocol (libp2p transports?) for a project
| targetting web browsers was a mistake, so I'm glad to see
| that HTTP is being considered now.
|
| [0] https://www.youtube.com/watch?v=4GwxrQ9Z3Bk
| 2color wrote:
| > My understanding is that IPFS always requires an IPFS -
| HTTP gateway to work in browsers, so I wonder what you mean
| by "P2P retrival in browsers".
|
| By P2P retrieval, I mean retrieval directly from a peer
| that has the data without additional hops.
|
| Historically HTTP gateways were the only way, because you
| couldn't dial (most) peers directly from a browser unless
| they had a CA-signed TLS certificate (needed in secure
| contexts.)
|
| A couple of things changed that:
|
| - WebTransport and WebRTC-direct allow browser-server(peer)
| communication without a CA signed certificate. WebTransport
| is very new and still has some problems in browser
| implementations.
|
| - We just launched AutoTLS which help public IPFS node get
| a wildcard let's encrypt TLS certificate, making it
| "dialable" from browsers
|
| With these, it becomes a whole lot easier to establish
| connections from browsers to IPFS nodes. But it takes time
| for the network to upgrade.
|
| Note that Libp2p transports are wrappers over network
| transports to allow interoperability across runtimes, e.g.
| TCP, QUIC, WebSockets, WebTransport. These are not custom
| protocols.
|
| Now you point about custom protocols, by which I presume
| you are referring to data exchange/retrieval protocols.
|
| There are two IPFS data transfer protocols: - Bitswap: the
| first and default data transfer protocol which requires a
| streaming network transport (so HTTP isn't ideal). - HTTP
| Gateways: Initially, HTTP gateways were servers that would
| handle retrieval (over Bitswap) for you if they didn't have
| the data locally (sometimes we refer to these as Recursive
| Gateways, like trustless-gateway.link and ipfs.io). For all
| the reasons in Lidel's talk (caching, interop,
| composability), we extended this notion and made HTTP
| Gateway a general interface for data retrieval.
|
| Today, there are large providers in the network (pinning
| services) that expose their data with an HTTP Gateway,
| which allows browsers to retrieve from directly.
|
| We still have more work to do to expose the gateway in Kubo
| with the new AutoTLS certificate, but for the time being,
| Bitswap should work well enough even in browsers over a
| WebSockets connection. ---
|
| Verified Fetch's aims to ensure you get data is it's
| available. If it fails to find a direct provider of the
| data, it will fall back on a recursive gateway like
| trustless-gateway.link which is configured in the defaults.
| As more IPFS nodes upgrade to newer versions, reliance on
| such centralised recursive gateways will become
| unnecessary.
|
| TL;DR: We're all in on HTTP. The real constraint to P2P is
| browser vendors and slow standard bodies.
| aeyes wrote:
| The Pear P2P framework might be worth a look if you want to get
| off of GitHub and turn it into a truly distributed system. If the
| index must be on GitHub then what good is it to have the files on
| IPFS?
|
| https://docs.pears.com/
| yoavm wrote:
| Thanks, I shall have a look into that. Technically the index
| too can live on IPFS, but the real challenge was how to know
| which databases exist, and where. This is what I'm using the
| GitHub topics for - everyone can create a repo and assign a
| topic to it, and the GitHub free API lets you search for
| matching repos. In a way, what I'm looking is to be able to
| make a request and get a different result when a new database
| is up, and the immutable nature of IPFS for example isn't great
| for that. GitHub stars and accounts also give some basic
| concept of content moderation. I should see what Pears offers
| in this space.
|
| Generally, I see the distributed nature of TeaTime as an
| instrument for being resilient, and less so a goal by itself.
| My thinking is that the GitHub database repositories are
| "resilient enough", especially if they can contain just a
| config.json file pointing to database files living elsewhere.
| But I'm not a lawyer and could be wrong.
| boramalper wrote:
| > Technically the index too can live on IPFS
|
| Besides, HTTP requests against indices should be fast enough
| for a decent user experience and IPFS (and/or its gateways)
| aren't great at that in my experience. I think using GitHub
| (or any other static hosting providers) was a good call in
| that regard.
| sram1337 wrote:
| I don't know what's in the database, so I don't know what to
| search for.
|
| How about a browse feature?
| yoavm wrote:
| Ha, that's a great point... I can probably work on a meta
| "table of content" for each database, shouldn't be hard to do.
| But generally the database descriptions say what they have -
| public domain books, library genesis non-fiction, etc.
| pizza wrote:
| This is awesome. Would it be able to integrate Pocket bookmarks
| into this somehow? Would love to be able to keep a cache offline
| of all the things I wanted to read (esp if I can prevent link-
| rot), and to be able to query them.
| yoavm wrote:
| I need to check about Pocket - do you have any relevant
| documentation? Note however that TeaTime does keep an offline
| cache, both a list of your recent books and the files
| themselves, but it does it using LocalStorage and IndexedDB
| because I really wanted to prevent the need for having user
| logins. Maybe I can add an option to download a "state dump"
| kind of thing, so that you can a file with your history and can
| always import it. I too always feel that LocalStorage could be
| a little flaky, in the sense that you don't always realize that
| switching a browser would mean losing all your data.
| nyolfen wrote:
| beautiful! i made a much crappier version of the same kind of
| thing a year or two ago with a manually converted libgen db from
| annas-archive's index, but the best i could come up with was
| webtorrenting a sqlite db of hashes and importing it into
| indexeddb then distributing the static fontend bundle as an urbit
| app; it was much uglier than this
| yoavm wrote:
| Thank you! I've seen quite a lot of similar projects as I was
| working on this, and they all served as much inspiration and
| also a proof that this can in fact be done.
|
| Also, thanks to whoever came up with thebookbay.org instance :)
| boramalper wrote:
| Also see: _Show HN: Fully-searchable Library Genesis on IPFS_
|
| https://news.ycombinator.com/item?id=28585208
|
| https://libgen-crypto.ipns.dweb.link/
| 2color wrote:
| This looks neat!
|
| IPFS contributor here.
|
| I had a look at your code and saw how you handle downloading from
| multiple gateways.
|
| There's a better way to this which also leverages direct P2P
| retrieval in browsers, which is now a thing with IPFS! [0] If you
| just want to fetch, checkout @helia/verified-fetch[1] which gives
| you a Fetch like API that accepts CIDs. It handles all the
| content routing and retrieval p2p magic and can help reduce
| reliance on gateways.
|
| You can also pass it gateways in case it can connect to providers
| directly (due to transports)
|
| [0] https://blog.ipfs.tech/2024-shipyard-improving-ipfs-on-
| the-w... [1] https://github.com/ipfs/helia-verified-
| fetch/tree/main/packa...
| yoavm wrote:
| Thanks! I really wanted to use verified-fetch, but it seemed
| like it doesn't really support all types of CIDs? In my case,
| most of the files use blake2b-256, and the last time I checked
| verified-fetch didn't like that :( Did something change?
|
| Edit: I now realised we previously talked on the Discord
| channel, thank you for the help there!
| 2color wrote:
| With some extra config you can pass custom hashers.
|
| Any reason you chose BLAKE2b-256?
|
| BLAKE2/3 are amazing hashing functions and have a lot of cool
| properties, but they really suck on the web in terms of
| performance. This came up literally yesterday
| https://bsky.app/profile/norman.life/post/3lbw2qltokc2i and
| has been reported by multiple devs recently (who also shared
| excitement about BLAKE)
| yoavm wrote:
| The main reason I wanted to support it is that it happens
| to be what Library Genesis uses. I'll have a look into
| passing these custom headers! Would definitely be more
| elegant than hitting all gateways together and hoping for
| the best.
| 2color wrote:
| Small correction:
|
| You can also pass [@helia/verified-fetch] gateways as a
| fallback for when it *cannot* connect to providers directly
| (due to transports).
| lakomen wrote:
| I absolutely don't understand this.
|
| Doesn't ipfs require a server running? Where is the server code?
| 2color wrote:
| It does require an IPFS node, i.e. server to provide tha data.
| The point is more that data isn't tied to any specific server.
| By content addressing with CIDs you can fetch from multiple
| IPFS nodes and even become a "provider" for a CID once your
| IPFS node downloads a copy.
| yoavm wrote:
| I'm currently using public IPFS Gateways to fetch the files
| from IPFS.
| rausr wrote:
| I'm slightly disappointed that this is not the Tea-Time that
| formed the comms/sync protocol used in the Croquet VR system (on
| Squeak Smalltalk) some years back. :)
| user_james92 wrote:
| IPFS needs server running or am I wrong?
| yoavm wrote:
| For fetching files from IPFS, you can use a public IPFS
| Gateway, and the Gateway is essentially your server. That's
| what TeaTime is doing.
| garganzol wrote:
| It does not work for me. For example:
|
| > Access to fetch at
| 'https://bafykbzacedvzdo4hru3wul5excjthwjuz5ggd6jcs4wg77tg7bk...'
| (redirected from 'https://w3s.link/ipfs/bafykbzacedvzdo4hru3wul5e
| xcjthwjuz5ggd...') from origin 'https://bjesus.github.io' has
| been blocked by CORS policy: No 'Access-Control-Allow-Origin'
| header is present on the requested resource. If an opaque
| response serves your needs, set the request's mode to 'no-cors'
| to fetch the resource with CORS disabled.
| yoavm wrote:
| The CORS headers seem fine, though I'm a little surprised about
| the content-type: $ curl -I https://bafykbzaced
| vzdo4hru3wul5excjthwjuz5ggd6jcs4wg77tg7bkn3c7hlvohq.ipfs.dweb.l
| ink/ HTTP/2 200 date: Thu, 28 Nov 2024 16:34:03 GMT
| content-type: application/x-rar-compressed content-
| length: 192215114 access-control-allow-headers: Content-
| Type access-control-allow-headers: Range access-
| control-allow-headers: User-Agent access-control-allow-
| headers: X-Requested-With access-control-allow-methods:
| GET access-control-allow-methods: HEAD access-
| control-allow-methods: OPTIONS access-control-allow-
| origin: *
|
| Can you try with another file? I do think that once I implement
| Helia's verified-fetch, issues like this should be less common.
___________________________________________________________________
(page generated 2024-11-28 23:01 UTC)