[HN Gopher] FoundationDB: A Distributed Key-Value Store
___________________________________________________________________
FoundationDB: A Distributed Key-Value Store
Author : eatonphil
Score : 218 points
Date : 2023-07-03 13:34 UTC (9 hours ago)
(HTM) web link (cacm.acm.org)
(TXT) w3m dump (cacm.acm.org)
| debussyman wrote:
| I worked next to the founders a decade ago and tried the first
| versions of the project (before Apple acq). Loved the concept,
| but it hasn't really lived up to the promise.
| neftaly wrote:
| I've been tooling around with "Tuple Database", which claims to
| be FoundationDB for the frontend (by the original dev of Notion).
|
| https://github.com/ccorcos/tuple-database/
|
| I have found it conceptually similar to Relic or Datascript, but
| with strong preformance guarantees - something Relic considers a
| potential issue. It also solves the problem of using reactive
| queries to trigger things like popups and fullscreen requests,
| which must be run in the same event loop as user input.
|
| https://github.com/wotbrew/relic
| https://github.com/tonsky/datascript
|
| Having a full (fast!) database as my React state manager gives me
| LAMP nostalgia :)
| jwr wrote:
| FoundationDB is absolutely incredible and I've been wondering why
| it doesn't get more popular over time. I suspect it's too complex
| to use directly in most applications, with people used to SQL-
| based solutions or simple KV stores.
|
| I always wanted my app to use a fully distributed database (for
| redundancy). I've been using RethinkDB in production for over 8
| years now. I'm slowly rebuilding my app to use FoundationDB.
|
| What I discovered when I started using FDB surprised me a bit. To
| make really good use of the database you can't really use a
| "database layer" and fully abstract it away from your app. Your
| code should be fully aware of transaction boundaries, for
| example. To make good use of versionstamps (an incredible
| feature) your code needs to be somewhat aware of them.
|
| I think FDB is a great candidate for implementing a "user-
| friendly" database on top of it, and in fact several databases
| are doing exactly that (using FDB as a "lower layer"). But that
| abstracts away too much, at least for me.
|
| The superficial take on FDB is "waah, where are my features? it
| doesn't do indexing? waaah, just use Postgres!".
|
| But when you actually start mapping your app's data structures
| onto FDB optimally, you discover a whole new world. For example,
| I ended up writing my indexing code myself, in my language
| (Clojure). FDB gives you all the tools, and a strict serializable
| data model to work with -- your language brings _your_ data
| structures and _your_ indexing functions. The combination is
| incredible. Once you define your index functions in your
| language, you will never want to look at SQL again. Plus, you get
| incredible features like versionstamps -- I use them to replace
| RethinkDB changefeeds and implement super quick polling for
| recent changes.
|
| Oh, and did I mention that it is a fully distributed database
| that _correctly_ implements the strict serializable consistency
| model? There are _very_ few dbs that can claim that. If you
| understand what that means, you probably know how incredible this
| is and I don 't have to convince you. If you _think_ you
| understand, I suggest you go and explore
| https://jepsen.io/consistency -- carefully reading and learning
| about the differences in various consistency models.
|
| I really worry that FoundationDB will not become popular because
| of its inherent complexity, while worse solutions (ahem, MongoDB)
| will be more fashionable.
|
| I would encourage everyone to at least take a look at FDB. It
| really is something quite different.
| manish_gill wrote:
| Seems like a very similar use-case like Zookeeper - use it for
| distributed coordination / consistency etc and build your
| actual database on top of it.
| brainzap wrote:
| I think FoundationDB will also have parts written in Swift, at
| least that is what Apple showed at WWDC.
| crabmusket wrote:
| That was Foundation, not FoundationDB.
| https://developer.apple.com/documentation/foundation
| jen20 wrote:
| No, it was FoundationDB [1].
|
| [1]: https://developer.apple.com/videos/play/wwdc2023/10164/?
| time...
| romanhn wrote:
| Back in 2014 or so, I saw the FoundationDB team demo the product
| at a developer conference. They had the database running across a
| bunch of machines, with a visual showing their health and data
| distribution. One team member would then be turning machines on
| and off (or maybe unplugging them from the network) and you could
| see FDB effortlessly rebalancing the data across the available
| nodes. It was a very striking, impressive presentation
| (especially as we were dealing with the challenges of distributed
| Cassandra at the time).
| boxcarr wrote:
| When I saw the post about Foundation DB, I remembered the exact
| same demo running on a cluster of Raspberry Pi instances!
| Sadly, no memory of it on YouTube.
| boxcarr wrote:
| Wayback Machine partially to the rescue. Found the page (http
| s://web.archive.org/web/20150325003301/https://foundatio...),
| but the Vimeo video was nuked when foundationdb.com shutdown.
| Here's the HN thread about that demo:
|
| https://news.ycombinator.com/item?id=5739721
| romanhn wrote:
| I feel like I saw something a bit more refined (I recall
| node statuses aggregated on one cool UI), so this may have
| been an earlier iteration, but the beginning of the
| following video has some of what we're talking about:
| https://youtu.be/Nrb3LN7X1Pg
| jbverschoor wrote:
| Around 2010-2013 (gaming), I found fdb, and to me it seemed like
| the perfect database because of their architecture. I tried it a
| bit, and was really happy with it.
|
| Unfortunately they were acquired by Apple, only to resurface
| something like 10 years later. All momentum was gone, and I'm not
| really aware nor interested in where they stand. I'll stick with
| my rusty old Postgres for a long time before I'd try anything
| else out.
| mlerner wrote:
| Really neat paper - a while ago I wrote a summary of the system:
| https://www.micahlerner.com/2021/06/12/foundationdb-a-distri...
| [deleted]
| mrtracy wrote:
| FoundationDB has, in my experience, always been well regarded in
| DB development circles; I think their test architecture -
| developed to easily reproduce rare concurrency failures - is its
| best legacy, as mentioned in a comment above and frequently
| before.
|
| However, since these topics are always filled with effusive
| praise in the comments, let me give an example of a distributed
| scenario where FDB has shortcomings: OLTP SQL.
|
| First, FDB is clearly designed for "read often, update rarely"
| workloads, in a relative sense. It produces multiple consistent
| replicas which are consistently queryable at a past time stamp,
| without a transaction - excellent for that profile. However, its
| transaction consistency method is both optimistic and
| centralized, and can lead to difficulty writing during high
| contention and (brief) system-wide transaction downtime if there
| is a failover; while it will work, it's not optimal for "write
| often, read once" workloads.
|
| Secondly, while it is an _ordered_ key value store - facilitating
| building SQL on top of it - the popular thought of layering SQL
| _on top of the distributed layer_ comes with many shortcomings.
|
| My key example of this is schema changes. Optimistic application,
| and keeping schema information entirely "above" the transaction
| layer, can make it extremely slow to apply changes to large
| tables, and possibly require taking them partially offline during
| the update. There are ways to manage this, but online schema
| changes will be a competitive advantage for other systems.
|
| Even for read-only queries, you lose opportunities to push many
| types of predicates down to the storage node, where they can be
| executed with fewer round trips. Depending on how distributed
| your system is, this could add up to significant additional
| latency.
|
| Afaik, all of the spanner-likes of the world push significant
| schema-specific information into their transaction layers - and
| utilize pessimistic locking - to facilitate these scenarios with
| competitive performance.
|
| For reasons like these, I think FDB will find (and has found) the
| most success in warehousing scenarios, where individual datum are
| queried often once written, and updates come in at a slower pace
| than the reads.
| mike_hearn wrote:
| You can do online schema changes with FDB, it all depends on
| what you do with the FDB primitives.
|
| A great example of how to best utilize FDB is Permazen [1],
| described well in its white paper [2].
|
| Permazen is a Java library, so it can be utilized from any JVM
| language e.g. via Truffle you get Python, JavaScript, Ruby,
| WASM + any bytecode language. It supports any sorted K/V
| backend so you can build and test locally with a simple disk or
| in memory impl, or RocksDB, or even a regular SQL database.
| Then you can point it at FoundationDB later when you're ready
| for scaling.
|
| Permazen is _not_ a SQL implementation. Instead it 's "language
| integrated" meaning you write queries using the Java
| collections library and some helpers, in particular,
| NavigableSet and NavigableMap. In effect you write and hard
| code your query plans. However, for this you get many of the
| same features an RDBMS would have and then some more, for
| example you get indexes, indexes with compound keys, strongly
| typed and enforced schemas with ONLINE updates, strong type
| safety during schema changes (which are allowed to be
| arbitrary), sophisticated transaction support, tight control
| over caching and transactional "copy out", watching fields or
| objects for changes, constraints and the equivalent of foreign
| key constraints with better validation semantics than what JPA
| or SQL gives you, you can define any custom data derivation
| function for new kinds of "index", a CLI for ad-hoc querying,
| and a GUI for exploration of the data.
|
| Oh yes, it also has a Raft implementation, so if you want
| multi-cluster FDB with Raft-driven failover you could do that
| too (iirc, FDB doesn't have this out of the box).
|
| And because the K/V format is stable, it has some helpers to
| write in memory stores to byte arrays and streams, so you can
| use it as a serialization format too.
|
| FDB has something a bit like this in its Record layer, but it's
| nowhere near as powerful or well thought out. Permazen is
| obscure and not widely used, but it's been deployed to
| production as part of a large US 911 dispatching system and is
| maintained.
|
| Incremental schema evolution is possible because Permazen
| stores schema data in the K/V store, along with a version for
| each persisted object (row), and upgrades objects on the fly
| when they're first accessed.
|
| [1] https://permazen.io/
|
| [2]
| https://cdn.jsdelivr.net/gh/permazen/permazen@master/permaze...
| SamReidHughes wrote:
| 100%. I don't have the time to read the paper but online
| schema changes, with the ability to fail and abort the entire
| operation if one row is invalid, are basically the same
| problem as background index building.
|
| If instead of using some generic K/V backend, it made use of
| specific FDB features, it might be even better. Conflict
| ranges and snapshot reads have been useful for me for some
| background index building designs, and atomic ops have their
| uses.
|
| > Oh yes, it also has a Raft implementation, so if you want
| multi-cluster FDB with Raft-driven failover you could do that
| too (iirc, FDB doesn't have this out of the box).
|
| I don't know what you mean by this. Multiple FDB clusters?
| mike_hearn wrote:
| It supports atomic ops and snapshot reads. Don't remember
| about conflict ranges. It doesn't require all backends to
| be identical, it supports a kind of graceful degradation
| when backends don't have all the features. The creator is
| quite keen on FDB and made sure Permazen works well with
| it.
|
| Yes multiple FDB clusters. IIRC FDB replication doesn't
| support full geo-replication, or didn't. There's a post by
| me about it somewhere on their forums.
| Dave_Rosenthal wrote:
| I totally agree with your high level point that there isn't a
| great SQL (OLTP, or otherwise) layer for FoundationDB. Building
| something like this would be very hard--but I don't think the
| FoundationDB storage engine itself would end up inflicting the
| limitations you mention if it was well executed. And
| FoundationDB _was_ specifically designed for real-time
| workloads with mixed reads /writes (i.e. the OLTP case).
|
| Whether or not concurrency is optimistic (or done with locks,
| or whatever) doesn't really have a bearing on things. Any
| database is going to suffer if it has a bunch of updates to a
| specific hot keys that needs to be isolated (in the ACID
| sense). As long as your reads and writes are sufficiently
| spread out you'll avoid lock contention/optimistic transaction
| retries.
|
| You speak to the real main limitation of FoundationDB when you
| talk about stuff like schema changes. There is a five-second
| transaction limit which in practice means that you cannot, for
| example, do a single giant transaction to change every row in a
| table. This was definitely a deliberate deliberate design
| choice, but not one without tradeoffs. The bad side is that if
| you want to be able to do something like this (lockout clients
| while you migrate a table) you need a different design that
| uses another strategy, like indirection. The good side is that
| screwed-up transactions that lock big chunks of your DB for a
| long time don't take down your system.
|
| I find that the people who are relatively new to databases tend
| to wish that the five second limit was gone because it makes
| things simpler to code. People that are running them in
| production tend to like it more because it avoids a slew of
| production issues.
|
| That said, I think for many situations a timeout like 30 or 60
| seconds (with a warning at 10) would be a better operating
| point rather than the default 5 second cliff.
| mrtracy wrote:
| I think that the SQL-on-top, and optimistic model, are
| definitely things that can have a workflow-dependent
| performance impact and are relevant.
|
| All databases do suffer under some red line of write
| contention; but optimistic databases will suffer _more_ , and
| will start degrading at a _lower level of contention_.
| "Avoiding contention" is database optimization table stakes,
| and you should be structuring every schema you can to do so;
| but hot keys are almost inevitable when a certain class of
| real-time product scales, and they will show up in ways you
| do not expect. When it happens, you'd like your DBMS to give
| as much runway as possible before you have to make the tough
| changes to break through.
|
| SQL-on-top becomes an issue for geographic distribution;
| without "pushing down" predicates, read-modify-write
| workloads, table joins, etc. on the client can incur
| significant round-trip time issuing queries. I think the lack
| of this is always going to present a persistent disadvantage
| vs selecting a competitor.
|
| And again, given FDBs multiple-full-secondary model, it's
| only a problem when working in real time, slower queries can
| work off a local secondary. But latest-data-latency is
| relevant for many applications.
| aseipp wrote:
| FWIW, I believe read transactions are unlimited in duration
| now that the Redwood engine has been available. But I haven't
| tested Redwood myself. Write transactions are still
| definitely limited to 5 seconds, though.
| gregwebs wrote:
| TiDB uses TiKV as an equivalent to foundationDB. It supports
| online migrations and pushing down read queries to the kv
| later. It also defaults to optimistic locking, but supports
| pessimistic. It also doesn't have a five second rate
| transaction limit. a SQL layer on top of foundation DB could
| probably solve all these problems and it wouldn't be novel.
| preseinger wrote:
| do you think the things you mention were deliberate design
| decisions?
| mike_hearn wrote:
| Yes, one of the nice things about FDB is it has extensive
| design docs. Optimizing for reading more often than writing
| is obviously a pretty normal design choice, outside of log
| ingestion you'll normally be reading more than writing. There
| are people using FDB for logs (snowflake iirc?) and it's been
| optimized for that sort of use case more in recent years, but
| it's not like it was an unreasonable choice.
| aseipp wrote:
| Snowflake uses FoundationDB for warehouse metadata in the
| control plane, IIRC. It is not in the data plane path for
| log ingestion or other warehousing tech. That said the
| control plane is, uh, pretty important!
| mrtracy wrote:
| They absolutely were, yes. There are very valuable
| application profiles where FoundationDB's design is
| excellent, and you can see that from its internal usage at
| large companies like Apple and Snowflake.
| monstrado wrote:
| I built an online / mutable time-series database using FDB a few
| years back at a previous company. Not only was it rock solid, but
| it scaled linearly pretty effortlessly. It truly is one of novel
| modern pieces of technologies out there, and I wish there were
| more layers built on top of it.
| georgelyon wrote:
| FoundationDB is a truly one-of-a-kind bit of technology. Others
| have already linked to the testing methodology that allows them
| to run orders of magnitude more database hours in test than have
| run in production: https://www.youtube.com/watch?v=4fFDFbi3toc
|
| A less known but also great talk is the follow which talked about
| what the a few of the team worked on next, effectively trying to
| generalize the methodology to any computer program:
| https://www.youtube.com/watch?v=fFSPwJFXVlw
|
| I liken the approach to being able to fuzz the execution space of
| the program, not just the inputs.
| [deleted]
| jeffbee wrote:
| How hard have people pushed this thing? We get regular threads of
| effusive praise, but little criticism. Last time I mentioned that
| years ago my colleagues found half a dozen ways to lose data in
| FDB I got called out here and even in private emails, but it
| seems more valuable to know where the limits of these systems
| are, and not very valuable to read the positive feelings of
| people who used FDB in trivial and uncritical ways.
| ryanworl wrote:
| FoundationDB is used at Datadog as the metadata store for
| Husky, the storage and query engine powering a significant
| number of Datadog products, such as logs, network performance
| monitoring, and trace analytics.
|
| 1. https://www.datadoghq.com/blog/engineering/introducing-
| husky...
|
| 2. https://www.datadoghq.com/blog/engineering/husky-deep-dive/
|
| 3. https://www.youtube.com/watch?v=mNneCaZewTg
|
| 4. https://www.youtube.com/watch?v=1-zo9jqdRZU
|
| I was involved with this project from the beginning and it
| would've taken significantly longer to deliver without
| FoundationDB.
| jeffbee wrote:
| I know there are multiple companies that use it. The question
| is not whether people put things into FDB. The question is
| whether anyone has checked to see if their junk was still
| there later. I don't consider large scale deployments to be
| proof of anything. When I worked on Gmail we were still
| finding data-loss bugs in either BigTable or Colossus
| regularly, even after those systems had been the largest
| datastores on the planet for many years.
| [deleted]
| eatonphil wrote:
| Is Snowflake big enough of a deal?
|
| https://news.ycombinator.com/item?id=16880404
|
| Also, in the post itself, authors including Apple and Snowflake
| devs, it mentions it's run in production by Apple and
| Snowflake.
|
| I haven't seen yet though what Apple uses it for.
| tilolebo wrote:
| It is used by CloudKit
|
| https://machinelearning.apple.com/research/foundationdb-
| reco...
| jeffbee wrote:
| The time at which my colleagues found easy ways to lose data
| was well after Apple had claimed to use it in iCloud at
| scale. So, I don't think deployment at scale is a proof of
| correctness. The thing that needs doing is regularly looking
| in the database for things that should be there.
| endisneigh wrote:
| I'm curious - could you elaborate on the circumstances?
| Like the version of FDB, cluster size, network
| circumstances, etc?
| Dave_Rosenthal wrote:
| Yes, there are definitely a lot of big companies that have used
| FoundationDB very hard at huge scale for many years. That said,
| yeah, it feels like there are also a lot of folks on HN who
| just jump on the "cool, fault simulation" bandwagon and don't
| have a lot of personal real-world experience.
|
| What I can tell you, for sure, is that if you find an issue
| with something as important and fundamental as data loss the
| team working on FoundationDB would take it super seriously.
| cetinsert wrote:
| https://deno.com/deploy is building
|
| https://deno.com/kv on FoundationDB!
| qaq wrote:
| Suprised none used it as a foundation for a NewSQL DB, the thing
| is battle tested and actively developed by Apple and Snowflake.
| danpalmer wrote:
| I think I remember the FDB team developing one that was closed
| source back before their acquisition. I thought the business
| model was going to be open core and closed, paid, layers on
| top. I seem to remember them benchmarking the SQL layer and it
| being highly performant still, despite the complexity it added.
|
| Maybe this thing still exists in close source form at Apple? It
| wouldn't surprise me if it does and forms the basis of a
| Spanner alternative, they're big enough to need it. Or maybe
| they canned it pre/post acquisition.
|
| Edit: ah, you've already mentioned the closed source layer that
| exists at Apple. There we go!
| endisneigh wrote:
| There's https://www.tigrisdata.com/
|
| It's similar to mongo (it's nosql)
| eatonphil wrote:
| Not a NewSQL database though as GP mentioned. I don't think
| Tigris has a SQL layer.
| endisneigh wrote:
| Yes I know. I explicitly said it was similar to mongo. Just
| responding to the bit that it's battle tested and used as a
| foundation (no pun intended) for another db. As far as I
| know it's the only database that has a company around it
| that is using FDB
| qaq wrote:
| there was poc of sqlite on top of FDB. There is also sql
| layer that Apple did not open source that they use at scale.
| Just seems a wasted opportunity.
| endisneigh wrote:
| It's because you introduce a lot of latency. Cockroachdb
| for example (which is a great db) has a lot of latency
| compared to Postgres.
|
| At the time of its release it was probably hard to justify
| having an order of magnitude more latency than competitors
| (of course they were not fault tolerant, but still).
| riku_iki wrote:
| hypothetically, you can run cocroach with replication
| factor 1, and have also low latency and apples to apples
| comparison.
| canadiantim wrote:
| I know some people have had success using FoundationDB as a KV
| store with SurrealDB[1]
|
| [1] https://github.com/orgs/surrealdb/discussions/25
| qaq wrote:
| thats a document-graph database though
| endisneigh wrote:
| I've been using FDB for toy projects for a while. It's truly rock
| solid. In my experience it's the best open source database I've
| used, including mariadb, Postgres and cockroach. That being said,
| I wish there were more layers as the functionality out of the box
| is very very limited.
|
| Ideally someone could implement the firestore or dynamodb api on
| top.
|
| https://github.com/losfair/mvsqlite
|
| Is basically distributed SQLite backed by FDB. I've been scared
| to use it since I don't know rust and can't attest to if mvcc had
| been implemented correctly.
|
| In using this I actually realized how coupled the storage engine
| is to the storage system and how few open source projects make
| the storage engine easily swap-able.
| fhrow4484 wrote:
| > That being said, I wish there were more layers as the
| functionality out of the box is very very limited.
|
| The record layer https://github.com/FoundationDB/fdb-record-
| layer which allows to store protobuf, and define the primary
| keys and index directly on those proto fields is truly amazing:
|
| https://github.com/FoundationDB/fdb-record-layer/blob/main/d...
| facu17y wrote:
| mvcc is already taken care of by fdb, no?
| endisneigh wrote:
| Yea, but mvsqlite implements its own to get around the
| limitations around transactions.
| tommiegannert wrote:
| I really wanted to use FoundationDB for building a graph
| database, but was taken aback by the limitations in record
| (10+100 kB) and somewhat transaction sizes (10 MB) [1]. And the
| documentation [2] doesn't really give any answers than "build
| it yourself."
|
| mvsqlite seems to improve the transaction size [3], which is
| nice. Does it also improve the key/value limitations?
|
| > Transaction size cannot exceed 10,000,000 bytes of affected
| data. [---] Keys cannot exceed 10,000 bytes in size. Values
| cannot exceed 100,000 bytes in size.
|
| [1] https://apple.github.io/foundationdb/known-limitations.html
|
| [2] https://apple.github.io/foundationdb/largeval.html
| aseipp wrote:
| Transaction size and duration is limited to keep the latency
| and throughput of the system manageable under load, from my
| understanding. It makes sense to some degree even with no
| background in the design; if you are serving X/rps with a
| latency of Y milliseconds, using Z resources, and you double
| Y, you now need to double your resources Z as well, to serve
| the same amount of clients. You always hit a cap somewhere,
| so if you want consistent throughput and latency, it's maybe
| not a bad tradeoff.
|
| mvsqlite fixes the transaction size through its own
| transaction layer, from my understanding; I don't know how
| that would impact performance. The 10kb/100Kb key value limit
| is probably not fixable in any way, but it's not really a
| huge problem as a user in practice for FDB because you can
| just shard the value across two keys in a consistent
| transaction and it's fine. 10 kilobyte keys have pretty much
| never ever been an issue in my cases either; you can
| typically just do something like hash a really big key before
| insert and use that.
| tanepiper wrote:
| A few years ago I was working at an agency, one of their teams
| was building a real-time gaming system on top of FoundationDB.
|
| Apple then bought it up and shut the open source down. They had
| to rebuild whole layers from scratch.
| Dave_Rosenthal wrote:
| Yeah, that sucked for sure and we hated to disappoint people
| like that (co-founder here). But you have it exactly backwards.
| FoundationDB was never open source. There was a binary that you
| could download and use as a trial, or you could buy a license
| for real use. The users that bought licenses got to keep using
| those licenses. Some of those customers went on to build
| billion-dollar businesses on top of FoundationDB (Snowflake!) A
| few years after acquiring the tech Apple themselves open
| sourced it (!) so now it is open source. The big challenge for
| users is that most of the sophisticated "layers" that make the
| tech into more of an easy-to-use database rather than just a
| storage engine are still proprietary.
| 58028641 wrote:
| As far as I can tell, FoundationDB was never open source until
| Apple open sourced it.
| tanepiper wrote:
| At least one reference on here from 2018 -
| https://news.ycombinator.com/item?id=16878786
|
| And here's a news story - https://www.forbes.com/sites/benkep
| es/2015/03/25/a-cautionar...
| detaro wrote:
| ... which both state that it wasnt open-source before the
| apple buyout.
| endisneigh wrote:
| It was never open source before apple. Rather the binary
| was freely available to be used. When apple bought them
| they took it away but continued to support customers with
| contracts. In that way it was inaccessible until it was
| open sourced.
| metadat wrote:
| Yes, I got bitten by this and will never forget- FDB
| abruptly shut off public access in mid-2015. Fortunately
| for me, it only cost half a day to migrate my system to
| Postgres.
| stephenr wrote:
| Hey now, don't let verifiable facts and observed history get
| in the way of a chance to bash Apple.
| hadjian wrote:
| His jokes are hilarious!
| mprime1 wrote:
| Part of the FDB team (great folks) went on to create something
| quite incredible I have the pleasure of having early access to.
| If you're into dependability check this out:
| https://antithesis.com/
| leetrout wrote:
| What always fascinated me is they built the simulator for the
| database first(ish) and relied on it as a first class citizen
| while building the DB:
|
| https://www.youtube.com/watch?v=4fFDFbi3toc
|
| > We wanted FoundationDB to survive failures of machines,
| networks, disks, clocks, racks, data centers, file systems, etc.,
| so we created a simulation framework closely tied to Flow. By
| replacing physical interfaces with shims, replacing the main
| epoll-based run loop with a time-based simulation, and running
| multiple logical processes as concurrent Flow Actors, Simulation
| is able to conduct a deterministic simulation of an entire
| FoundationDB cluster within a single-thread! Even better, we are
| able to execute this simulation in a deterministic way, enabling
| us to reproduce problems and add instrumentation ex post facto.
| This incredible capability enabled us to build FoundationDB
| exclusively in simulation for the first 18 months and ensure
| exceptional fault tolerance long before it sent its first real
| network packet. For a database with as strong a contract as the
| FoundationDB, testing is crucial, and over the years we have run
| the equivalent of a trillion CPU-hours of simulated stress
| testing.
|
| https://pierrezemb.fr/posts/notes-about-foundationdb/
| riwsky wrote:
| "The Jepsen is coming... from INSIDE THE HOUSE!"
| AaronFriel wrote:
| When I was writing a Haskell client library for Hyperdex,
| another distributed kv store, I found it incredibly helpful to
| implement a simulator for correctness. This helped me identify
| which behavior was unspecified (arithmetic overflow: should
| error) or where my simulator deviated.
|
| https://github.com/AaronFriel/hyhac/blob/master/test/Test/Hy...
|
| Alas, I think Hyperdex development paused a few years later.
| It's a shame that it stopped then.
| pavlov wrote:
| For some types of distributed systems, you can do this kind of
| simulated testing in advance by building a TLA+ model.
|
| It's not a full-blown simulator (because generally the
| application code doesn't even exist yet when you're building
| the TLA+ model). But it can let you collect data and validate
| assumptions about your design before writing a single line of
| code.
| rockwotj wrote:
| My beef with TLA+ is that it's not the same code, so while
| you're testing the design yes, you aren't testing the
| implementation of the design, which is just as important (if
| not harder too) to get right.
| aseipp wrote:
| Yes, but there really aren't too many good solutions to
| that that aren't either extremely language or domain
| specific. And if you're careful you can get a lot of direct
| mileage out of it. For example, MongoDB (yes, that one!)
| used it in the development of their Atlas system and has a
| paper about using TLA+ to model the system, characterize
| behaviors, then generate compilable-code test cases from
| those minimal set of behaviors -- which are then directly
| linked against the core internals of the Atlas codebase as
| a client library. They then run those tests and re-generate
| them when the model changes. "Model based test case
| generation" is the strategy here. So you can characterize
| what happens in split brain scenarios, state machine
| transition failures (conflicting transactions), etc.
|
| In reality the design stage is a pretty critical phase so
| you need all the help you can get, so even if you don't
| like TLA+ you're way better off than not modeling at all.
|
| As an example of the language specific thing, though,
| there's a library for Haskell I like that's very cool,
| called Spectacle, which also implements the temporal logic
| of TLA+ along with a model checker, but as a Haskell DSL.
| An interesting benefit of this is that you can model check
| actual real Haskell code that runs e.g. in your services,
| but I haven't taken this very far. There are also
| alternative solutions like Stateright for Rust. But again,
| not everyone has the benefit of these...
| bigfish24 wrote:
| Do you know the paper for Atlas?
| aseipp wrote:
| Yes, I managed to find it: "eXtreme Modeling in Practice"
| https://arxiv.org/pdf/2006.00915.pdf
|
| Unfortunately I got the product wrong; it was not Atlas,
| it was Realm Sync. All of the test-case generation stuff
| is in Section 5.
| pavlov wrote:
| Yes, the model is more like an executable form of
| documentation. There's no guarantee that code comments
| match what the code actually does; similarly there's no
| guarantee that the TLA+ model matches what the system does.
|
| Documentation is still generally useful, and so is a model.
| You have to be committed to keeping both up to date as the
| code evolves.
| falsandtru wrote:
| I'm loving this point. The unfortunate thing is those tests are
| closed source (I saw a maintainer says so probably in an issue
| before). It seems testable but still seems to be closed source.
| So we cannot fork the project even if FDB becomes totally
| closed source again.
| aseipp wrote:
| No, the simulation harness and tests are open source and you
| can run them. It would be impossible for anyone to contribute
| anyway without it, for example, Snowflake, which heavily
| depends on it. It's built into the server binary directly, so
| the same code is always used, and it's simply a different
| operational mode when compared to the real server. I used to
| have a project to do lots of simulation runs on my big 32
| core server and then aggregate the logs into clickhouse for
| analysis. It wasn't that hard.
|
| However, they (at least at the time most of the developers
| were at Apple, many have now moved to Snowflake and the Apple
| team has grown a little I think) haven't released or
| integrated their nightly cluster and performance testing
| systems into open, nor have they integrated them with GitHub
| Actions or Nightly runs or anything. My understanding is that
| this is "just" a lot of compute cluster/platform
| orchestration code on top of the tests that exist in the
| repository. So, while Apple or Snowflake integrates changes
| across hundreds of concurrent fuzzing simulations on whatever
| platforms they have, if you write patches yourself, you're
| stuck with long simulation runs. Maybe that's changed; I
| haven't kept up since the 7.0 series.
|
| In practice if you write patches and they accept them, they
| will just do the testing in their runs for you, on a cluster
| far larger than what you could have. Failures reports will
| tell you how to reproduce them from the test files. As a
| contributor, testing the system on your own is mostly a
| matter of how much money or how many CPU cores you can
| personally stand to set on fire.
|
| Someone could probably integrate this functionality into a
| Kubernetes operator or something so that outside engineers
| could run large scale simulations reliably. But it is really
| expensive and CPU/compute intense, no matter how you go about
| it.
|
| [1] https://forums.foundationdb.org/t/how-to-use-
| foundationdb-un...
|
| [2] https://github.com/apple/foundationdb/tree/main/tests
| falsandtru wrote:
| Those tests are not the implementations of the tests, just
| specifying the test case and the few options. But I found
| the implementations. I am not sure if this is all of the
| simulation tests, but it seems to cover the basic cases.
|
| https://github.com/apple/foundationdb/tree/main/fdbserver/w
| o...
|
| > Someone could probably integrate this functionality into
| a Kubernetes operator or something so that outside
| engineers could run large scale simulations reliably. But
| it is really expensive and CPU/compute intense, no matter
| how you go about it.
|
| Maybe this.
|
| https://github.com/FoundationDB/fdb-joshua
| aseipp wrote:
| Yeah, that's basically an actually good implementation of
| the pile of crap that I threw together several years ago
| while writing a few patches. :)
|
| And yes, I linked to the spec files because there
| actually isn't that much test code written in Flow I
| feel; the high-level specs in the .txt files can be mixed
| and matched so much to create a lot of variety from some
| small number of primitives, so that's really where all
| the good stuff is. Implementation vs interface, and all
| that.
| samsquire wrote:
| Thanks for sharing that and quoting an incredibly useful
| snippet.
|
| This is such an interesting topic!
|
| Some thoughts:
|
| * I wonder if the approach could be used to implement
| debuggable replayability, with accurate tracing and profiling.
| A bit like what verdagon is doing with Vale.
|
| * It could be used to integrate the event loop with tracing
| (rather than instrumentation with Jaegar)
|
| * I really like the idea that "every object" is an event loop,
| which reminds me of Microsoft Orleans with its actor model for
| its grains.
|
| * I am interested with actor and lightweight thread
| architectures.
|
| * I am interested in the scalabiliy of nodejs event loop
| architecture and Win32 desktop application programming.
|
| * I think this approach could be used to test and simulate
| microservices.
|
| * Approach could be used to test GUIs with React Redux reducer
| style.
| ajmurmann wrote:
| Working on a distributed key/value store myself, I couldn't
| agree more and think what FoundationDB did for testing from the
| start is absolutely the way to go. Testing distributed system
| is very tricky and tests can be incredibly time consuming and
| bring everything to a halt.
| ibotty wrote:
| Every time I look at FoundationDB for replacing some Redis usage
| I wonder about key expiry/TTL, look for it and find nothing.
|
| Is this such a strange use case, that there is not even a blog
| entry about it only some forum entries?
| endisneigh wrote:
| You would need to implement that yourself. Easily can be done
| by storing tuples with your expiry date. You then could watch
| the keys to remove expired keys automatically. FDB is very
| barebones by design. Alternatively (and easier):
|
| https://forums.foundationdb.org/t/designing-key-value-expira...
___________________________________________________________________
(page generated 2023-07-03 23:00 UTC)