[HN Gopher] SlateDB - An embedded database built on object storage
       ___________________________________________________________________
        
       SlateDB - An embedded database built on object storage
        
       Author : notamy
       Score  : 149 points
       Date   : 2024-10-01 22:18 UTC (1 days ago)
        
 (HTM) web link (slatedb.io)
 (TXT) w3m dump (slatedb.io)
        
       | anon291 wrote:
       | This seems to be a key value store built atop object storage.
       | Which is to say, it seems completely redundant. Not sure if
       | there's some feature I'm missing, but all of the six features
       | mentioned on the front page are things you'd have if you used the
       | key value store directly (actually, you get more because then you
       | get multiple writers).
       | 
       | I was excited at first and thought this was SQL atop S3 et al.
       | I've jerryrigged a solution to this using SQLite with a
       | customized VFS backend, and would suggest that as an alternative
       | to this particular project. You get the benefit of ACID
       | transactions across multiple tables and a distributed backend.
        
         | abound wrote:
         | If you want SQLite backed by S3, maybe something like SQLite in
         | :memory: mode with Litestream would work?
         | 
         | Edit: actually not sure if you can use :memory: mode since
         | Litestream uses the WAL (IIRC), so maybe a ramfs instead
        
           | anon291 wrote:
           | There are many solutions. The particular example I was using
           | SQLite via webassembly and then resorting to HTTP's fetch api
           | for a read-only solution.
        
           | candiddevmike wrote:
           | In my experience, SQLite on S3 is ridiculously slow. The
           | round trip for writes is horrendous, so you end up doing
           | batch saves, but you need a WAL, which has the same problem
           | as the main DB file.
        
         | iudqnolq wrote:
         | Using an s3 object per key would be too expensive for many use
         | cases.
         | 
         | The website is a bit fancy but the readme seems to pretty
         | straightforwardly explain why you might want this. It seems to
         | me like a nice little (13k loc) project that doesn't fit my
         | needs but might come in handy for someone else?
        
         | necubi wrote:
         | This is a low-level embedded db that would be used by sql
         | databases/query engines/streaming engines/etc rather than
         | something that's likely to make sense for you to use as an
         | application developer. It sits in a similar space to RocksDB
         | and LevelDB.
         | 
         | You generally can't use object storage directly for this stuff;
         | if you have a high volume of writes, it's incredibly slow (and
         | expensive) to write them individually to s3. Similarly, on the
         | read side you want to be able to cache data on local disk &
         | memory to reduce query latency and cost.
        
         | vineyardmike wrote:
         | > I was excited at first and thought this was SQL atop S3 et
         | al.
         | 
         | You can check out Neon.tech who makes an OS Postgres-on-s3 and
         | DuckDB who makes an embedded DB with transaction support that
         | can operate over S3
        
         | aseipp wrote:
         | People want object storage as the backend because in practice
         | it means that you can decouple compute and storage entirely, it
         | has no requirement to provision space up front, and robust
         | object storage systems with (de facto) standardized APIs like
         | S3's are widely available for all kinds of deployments and from
         | many providers, in many forms. In other words: it works with
         | what people already have and want.
         | 
         | Essentially every standalone or embedded key-value storage
         | solution treats the KV store and its operation like a database,
         | from what I can tell -- which is sensible because that's what
         | they are! But people use object stores exactly because they
         | _don 't_ operate like traditional databases.
         | 
         | Now there are problems with object stores (they are very coarse
         | grained and have high per-object overhead, necessitating some
         | design that can reconcile the round hole and the square peg) --
         | but this is just the reality of what people are working with.
         | If there is some other key-value store server/implementation
         | you know of, one that performs and offers APIs like an actual
         | database (e.g. multi writer, range scans, atomic writes) but
         | with unlimited storage, no provisioning, and it's got over 10+
         | different widespread implementations across every major compute
         | and cloud provider -- I'm interested in what that project is.
        
       | epolanski wrote:
       | Not a db guy, just asking, what does it mean "embedded" database?
       | 
       | I'm confused here, because Google says it's a db bundled with the
       | application, but that's not really what I get from the landing
       | page.
       | 
       | What problem does it solve?
        
         | leetrout wrote:
         | Embedded means it runs in your application process not a
         | standalone server / service.
        
       | yawnxyz wrote:
       | is this an easier to do the "store parquet on s3 > stream to
       | duckdb" pattern that's popping up more and more?
        
         | vineyardmike wrote:
         | > MemTables are flushed periodically to object storage as a
         | string-sorted table (SST). The flush interval is configurable.
         | 
         | Looks like it has a pretty similar structure under the hood,
         | but DuckDB would get you more powerful queries.
         | 
         | FYI duckdb directly supports writes (and transactions) so you
         | don't necessarily even need the separate store step.
        
         | jitl wrote:
         | This is more targeted at OLTP style workloads with mutable data
         | and potentially multiple writers
        
         | kosmozaut wrote:
         | Do you know any resources/examples about the setup you mean? It
         | sound interesting but from a quick search I didn't find
         | anything straight forward.
        
           | atombender wrote:
           | Check out Apache Iceberg. It's a format for storing Parquet
           | data in object storage, for both read and write. Not sure if
           | DuckDB does Iceberg (I know ClickHouse does), but it's a
           | similar principle, disaggregating data from compute.
        
       | loxias wrote:
       | Can I please, please, please, have C++ or at least C bindings? :)
       | Or the desired way to call Rust from another runtime? I don't
       | know any Rust.
        
         | jitl wrote:
         | Rust is just another programming language that's quite similar
         | to C++. The main difference is there's like 4 types for String
         | (some are references and some are owned) and methods for a
         | struct go in a `impl StructName` block after the struct
         | definition instead of inside it.
         | 
         | I don't really know rust either but I'm currently writing some
         | bindings to expose Rust libraries to NodeJS and not having too
         | much trouble.
         | 
         | For rust -> c++ I googled one time and found this tool which
         | Mozilla seems to use to call Rust from C++ in their web
         | browser, maybe it would "just work":
         | https://github.com/mozilla/cbindgen?tab=readme-ov-file
        
           | sebastianconcpt wrote:
           | Although the borrowing rules will make you feel is quite a
           | different language than others.
        
       | jitl wrote:
       | From the docs https://slatedb.io/docs/introduction/
       | 
       | > NOTE
       | 
       | > Snapshot isolation and transactions are planned but not yet
       | implemented.
        
         | quadrature wrote:
         | Might have been older docs. They now say that transactions are
         | supported
         | 
         | " Snapshot isolation: SlateDB supports snapshot isolation,
         | which allows readers and writers to see a consistent view of
         | the database. Transactions: Transactional writes are
         | supported."
        
           | jitl wrote:
           | I don't see any evidence this is implemented in the source
           | code, and the README on Github also marks it as not-yet-
           | implemented. There is an open issue for "design doc for
           | transaction" here:
           | https://github.com/slatedb/slatedb/issues/248 and an open
           | issue for "Add range queries" here:
           | https://github.com/slatedb/slatedb/issues/8
        
       | nmca wrote:
       | > Object storage is an amazing technology. It provides highly-
       | durable, highly-scalable, highly-available storage at a great
       | cost.
       | 
       | I don't know if this was intended to be intentional funny, but
       | there is a little ambiguity in the expression "great cost",
       | typically great cost means very expensive.
       | 
       | Very cool and useful shim otherwise :)
        
         | unshavedyak wrote:
         | Is there an alternate meaning that you first took it as?
         | Monetary cost was my take as well hah.
        
           | raybb wrote:
           | The other meaning it could have is that it's a good
           | price/deal.
        
           | paulgb wrote:
           | Monetary cost in both cases, but it's the two meanings of
           | "great", which can either mean "large" or "good".
        
         | OJFord wrote:
         | That's funny actually - 'great cost', great takes meaning of
         | large; 'great price', great takes meaning of very good (i.e.
         | small in this context).
         | 
         | Always that way around, ESL's a minefield!
        
         | notthistime12 wrote:
         | Native English speaker here. "At a great cost" means "at a good
         | price". "At great cost" would mean "expensive".
        
           | skrtskrt wrote:
           | you 100% correct not sure why this is downvoted away
        
       | hantusk wrote:
       | Since writes to object storage are going to be slow anyway, why
       | not double down on read optimized B-trees rather than write
       | optimized LSM's?
        
         | chipdart wrote:
         | I think slow writes are not a major concern, as most databases
         | already use some fast log-type data structure to persist
         | writes, and then merge/save these logs to a higher-capacity and
         | slower medium on specific events.
        
       | goodpoint wrote:
       | Despite the name this is not a database.
        
         | mtndew4brkfst wrote:
         | What definition/criteria do you feel it does not satisfy?
        
           | goodpoint wrote:
           | Pretty much the usual definition.
           | https://en.wikipedia.org/wiki/Database
        
             | jitl wrote:
             | > Formally, a "database" refers to a set of related data
             | accessed through the use of a "database management system"
             | (DBMS), which is an integrated set of computer software
             | that allows users to interact with one or more databases
             | and provides access to all of the data contained in the
             | database (although restrictions may exist that limit access
             | to particular data). The DBMS provides various functions
             | that allow entry, storage and retrieval of large quantities
             | of information and provides ways to manage how that
             | information is organized.
             | 
             | What makes SlateDB not qualify for this definition? It
             | seems to qualify for me.
        
             | mtndew4brkfst wrote:
             | Do you feel that e.g. Redis fails to satisfy the same
             | definition in basically the same ways? If it does fulfill
             | the criteria, what do you see as the distinction?
        
               | notthistime12 wrote:
               | Redis is a key-value store.
        
               | jitl wrote:
               | A key-value store is a type of database.
        
               | rehevkor5 wrote:
               | Calling Redis a database is a generous generalization.
               | For example, Redis does not necessarily provide the same
               | kind of durability as a database does, nor the
               | capabilities one would expect from an RDBMS. In many
               | cases, depending on configuration, it might be more
               | appropriate to instead refer to Redis as a cache, an in-
               | memory database, or a NoSQL database.
        
       | tgdn wrote:
       | "It doesn't currently ship with any language bindings"
       | 
       | Rust is needed to use SlateDB at the moment
        
       | demarq wrote:
       | Embed cloud
       | 
       | Sounds like they just cancel each other out. Not sure what
       | advantage embedding will yield here
        
       | remon wrote:
       | I've read the introduction and descriptions two times now and I
       | still don't understand what this adds to the proceedings. It
       | appears to be an extremely thin abstraction over object storage
       | solutions rather than an actual DB which the name and their texts
       | imply.
        
       | shenli3514 wrote:
       | Went thru the document:
       | https://slatedb.io/docs/introduction/#use-cases I can not
       | understand why are they targeting the following use cases with
       | this architecture. * Stream processing * Serverless functions *
       | Durable execution * Workflow orchestration * Durable caches *
       | Data lakes
        
       | drodgers wrote:
       | It looks like writes are buffered in an in-memory write ahead log
       | before being written to object storage, which means that if the
       | writer box dies, then you lose acknowledged writes.
       | 
       | I've built something similar for low-cost storage of infrequently
       | accessed data, but it uses our DBMS (MySQL) for the WAL (+ cache
       | of hot reads), so you get proper durability guarantees.
       | 
       | The other cool trick to use is to use Be-trees (a relatively
       | recent innovation from Microsoft Research) for the object storage
       | compaction to minimise the number of write operations needed when
       | flushing the WAL.
        
         | quadrature wrote:
         | You have the ability to choose your durability guarantee. You
         | can choose to have synchronous writes, in which case the client
         | blocks until the write is acknowledged.
         | 
         | https://docs.rs/slatedb/latest/slatedb/config/struct.WriteOp...
        
         | 0x1ceb00da wrote:
         | Is there something similar that caches recent changes locally
         | if the device is offline and uploads them when it comes online?
        
       | rehevkor5 wrote:
       | I don't see how it's embedded if it relies on nonlocal
       | services... on the contrary it says specifically, "no local
       | state". It appears to be more analogous to a "lakehouse
       | architecture" implementation (similar to, for example, Apache
       | Iceberg), where your app includes a library that knows how to
       | interact with the data in cloud object storage.
        
         | indrora wrote:
         | The general definition of "Embedded" is that the engine runs in
         | your application space, as opposed to the more traditional DBMS
         | (MariaDB, Valkey, etc) being a Full Fat Process just for
         | itself. [1] This can reduce RTT to the database itself because
         | you're already there: You've got a whole DB at your fingertips.
         | There's very little worry of cross-application data stink
         | because _each application has its own database_ , alleviating a
         | lot of the authN/Z that comes with a network attached DBMS.
         | 
         | 1: https://en.wikipedia.org/wiki/Embedded_database
        
       ___________________________________________________________________
       (page generated 2024-10-02 23:01 UTC)