[HN Gopher] Rqlite 6.0: the evolution of a distributed database ...
___________________________________________________________________
Rqlite 6.0: the evolution of a distributed database design
Author : otoolep
Score : 154 points
Date : 2021-06-10 12:45 UTC (10 hours ago)
(HTM) web link (www.philipotoole.com)
(TXT) w3m dump (www.philipotoole.com)
| macawfish wrote:
| Would it be possible to run this in the browser? Via WASM or
| something?
| eudoxus wrote:
| It wouldn't make too much sense to run a distributed raft-based
| SQLite system in its entirety in a browser (via WASM). However,
| you can run an individual SQLite instance in the browser (via
| WASM) using this: https://sql.js.org/#/
| killingtime74 wrote:
| May I ask how users generally use Rqlite? Distributed db for web
| apps? Embedded? Thanks
| otoolep wrote:
| I'm not sure how folks use it, but I think the sweet spot is
| for simple-to-run relational storage for a _smallish_ set of
| data.
|
| Some people don't use it for the distribution, but just like a
| HTTP API in front of SQLite.
|
| https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md#why-...
| thomasmarcelis wrote:
| Thats close to our use case. We use it to sync a small amount
| of configuration data across a small amount of servers (2-50,
| depending client).
|
| RQlite was perfect as our software is an addon for a legacy
| platform. We needed an easy low-access way of installing a
| distributed datastore.
| otoolep wrote:
| Cool. Did you use read-only nodes, by any chance?
|
| https://github.com/rqlite/rqlite/blob/master/DOC/READ_ONLY_
| N...
| simonw wrote:
| "I'm not sure how folks use it"
|
| Suggestion: have you considered running office hours and
| inviting your users to chat to you about what they're doing
| with it?
|
| I've been doing that for six months for my Datasette project
| and I've had over 60 conversations now, it's been a
| revelation - it almost completely solved the "I don't know
| how people are using this" problem for me, and gave me a ton
| of ideas for future directions for the project.
|
| I wrote more about that here:
| https://simonwillison.net/2021/Feb/19/office-hours/
| otoolep wrote:
| Interesting! An old colleague of mine, Ben Johnson, also
| does the same thing for litestream, his latest SQLite
| replication project. I thought it was just him.
|
| Now that I know _two_ folks do it, I 'll have to give it
| serious thought. Thanks for the blog post ref
|
| https://github.com/benbjohnson/litestream
| simonw wrote:
| I believe Ben got the idea from me - I was also his first
| ever office hours appointment once he started :)
| abotsis wrote:
| Someone needs to officiate a marriage between this and duckdb.
| c17r wrote:
| As I was reading I was thinking "Why doesn't the Follower proxy
| the request to the Leader?" which I see was covered later on in
| "Transparent request forwarding". Good stuff!
|
| I remember building a janky version of sqlite+raft for Stripe's
| 2014 CTF. I'm sure others here have made a similar comment when
| rqlite gets posted to HN.
| otoolep wrote:
| rqlite author here. Yes, it's coming in a future release and is
| much easier to do now.
|
| One key principle of rqlite has always been quality, clean
| design, and simplicity of operation. So I've been reluctant to
| add a feature -- in this case Request Forwarding -- until I was
| sure it would be a clear win and not make rqlite less robust.
| After years of experience with the system now, I'm happy it can
| be added in a high-quality manner.
| robertlagrant wrote:
| It definitely makes working with round robin proxies (e.g.
| k8s services) much, much simpler.
|
| Having proxies be aware of the leader, or having clients
| being able to access nodes directly instead of behind said
| proxies, is a lot more complexity.
| sroussey wrote:
| It does add quite a bit to network traffic though, which may or
| may not be an issue.
| sigmonsays wrote:
| isn't knowing every member of a cluster and the leader a core
| function of raft?
|
| I feel like this post is leaving out critical pieces of
| information, like why the URL can't be deterministic or data
| about comparisons of different approaches.
|
| At what cluster size and concurrency does asking every node break
| down?
|
| I have been meaning to take a closer look at rqlite and want to
| understand more about it.
| otoolep wrote:
| rqlite author here.
|
| Yes, you're right every node knows the _Raft network address_
| of every other node. But Raft network addresses are not the
| network address used by clients to query the cluster. Instead
| every node also exposes a HTTP API for queries.
|
| So code needs exist to share information -- in this case the
| HTTP API addresses -- between nodes that the Raft layer doesn't
| handle.
| otoolep wrote:
| Also the HTTP API URL isn't deterministic because a) the
| operator sets it for any given node, and b) over the lifetime
| of the cluster the entire set of nodes could change as nodes
| fail, are replaced, etc.
| otoolep wrote:
| >At what cluster size and concurrency does asking every
| node break down?
|
| None, a follower only needs to ask the leader. So
| regardless of the size of the cluster, in 6.0 querying a
| follower only introduces a single hop to the leader before
| responding to the client. While this hop was not required
| in earlier versions, earlier versions had to maintain state
| -- and stateful systems are generally more prone to bugs.
| vlowther wrote:
| I am curious about where things broke down with the 301
| based solution y'all used earlier.
| otoolep wrote:
| I included details in the blog post, the 3.x to 5.x
| design had the following issues:
|
| - stateful system, with extra data stored in Raft. Always
| a chance for bugs with stateful systems.
|
| - some corner cases whereby the state rqlite was storing
| got out of sync with the some other cluster
| configuration. Finding the root cause of these bugs could
| have been very time-consuming.
|
| - certain failure cases happened during automatic cluster
| operations, meaning an operator mightn't notice and be
| able to deal with them. Now those failures cases -- while
| still very rare -- happen at query time. The operators
| know immediately sometime is up, and can deal with the
| problem there and then, usually by just re-issuing the
| query.
| rektide wrote:
| Not to be mistaken for high-availability Dqlite[1], which is one
| of the options one can run the k3s kubernetes distribution on
| (instead of etcd), via the Kine etcd shim[2]. Ultimately though
| the K3s team replaced Dqlite with an embedded etcd to get high-
| availability[3].
|
| [1] https://dqlite.io/
|
| [2] https://github.com/k3s-io/kine
|
| [3] https://rancher.com/docs/k3s/latest/en/installation/ha-
| embed...
| solarengineer wrote:
| For anyone else facing a website crash error message: I had to
| turn off "Reader Mode" on Safari iOS to be able to view the
| DQLite website
| [deleted]
| mirekrusin wrote:
| Very interesting, I wonder how it'd hold against Aphyr's db claim
| invalidating machine.
| sesm wrote:
| Related Github ticket:
| https://github.com/rqlite/rqlite/issues/94 (labeled as 'help
| wanted')
| skyde wrote:
| its a very simple request log in front of SQLite so unless
| there is a problem with the (Paxos/raft) algorithm used for
| replicating the logs it should hold very well.
| skyde wrote:
| actually it's using "github.com/hashicorp/raft" and not its
| own version.
|
| So since Hashicorp survived Aphyr test, it should be fine.
| otoolep wrote:
| Not quite as simple as that. I could have still used the
| Hashicorp code wrongly.
|
| And once I did: https://github.com/rqlite/rqlite/issues/5
|
| aphyr himself chimed in.
| otoolep wrote:
| Relevant doc: https://github.com/rqlite/rqlite/blob/maste
| r/DOC/CONSISTENCY...
| convivialdingo wrote:
| Very interesting... Does it support full text search? I scanned a
| little bit but didn't find any info either way.
| otoolep wrote:
| Whatever SQLite exposes is available in rqlite.
| maxpert wrote:
| Thanks for the quality work here. Have been considering using
| this over infinicache or hazelcast for locally replicated caching
| scenario. Are there any battle testing stories/case-studies out
| there?
| geoka9 wrote:
| Have you considered using olric[0]?
|
| Just asking so that I can piggyback on your research :)
|
| [0] https://github.com/buraksezer/olric
| otoolep wrote:
| I haven't, I'm not super familiar with those systems TBH.
| Willson50 wrote:
| https://web.archive.org/web/20210610124826/https://www.phili...
___________________________________________________________________
(page generated 2021-06-10 23:01 UTC)