[HN Gopher] Transactional Object Storage?
___________________________________________________________________
Transactional Object Storage?
Author : mbrt
Score : 62 points
Date : 2024-11-17 13:20 UTC (8 days ago)
(HTM) web link (blog.mbrt.dev)
(TXT) w3m dump (blog.mbrt.dev)
| svrakitin wrote:
| Pretty cool! Do you have any ideas already about how to make it
| work with S3, considering it doesn't support If- headers?
| boulos wrote:
| S3 recently added basic matching support
| (https://aws.amazon.com/about-aws/whats-
| new/2024/08/amazon-s3..., https://docs.aws.amazon.com/AmazonS3/
| latest/userguide/condit...).
|
| They don't have the full suite of GCS's capabilities
| (https://cloud.google.com/storage/docs/request-
| preconditions#...) but it's something.
| mbrt wrote:
| I think it's now much easier to achieve than a year ago. The
| critical one is conditional writes on new objects, because
| otherwise you can't safely create transaction logs in the
| presence of timeouts. This is not enough though.
|
| My approach on S3 would be to ensure to modify the ETag of an
| object whenever other transactions looking at it must be
| blocked. This makes it easier to use conditional reads (https:/
| /docs.aws.amazon.com/AmazonS3/latest/userguide/condit...) on
| COPY or GET operations.
|
| For write, I would use PUT on a temporary staging area and then
| conditional COPY + DELETE afterward. This is certainly slower
| than GCS, but I think it should work.
|
| Locking without modifying the object is the part that needs
| some optimization though.
| mbrt wrote:
| And I see more possibilities now that
| https://aws.amazon.com/about-aws/whats-
| new/2024/11/amazon-s3... is available. It will get easier and
| easier to build serverless data lakes, streaming, queues.
| choppaface wrote:
| Not a full solution, but seeing the OP seeks to be a key-value
| store (versus full RDBMS? despite the comparisons with Spanner
| and Postgres?), important to weigh how Rockset (also mainly KV
| store) dealt with S3-backed caching at scale: *
| https://rockset.com/blog/separate-compute-storage-rocksdb/
| * https://github.com/rockset/rocksdb-cloud
|
| Keep in mind Rockset is definitely a bit biased towards vector
| search use cases.
| mbrt wrote:
| Nice, thanks for the reference!
|
| BTW, the comparison was only to give an idea about isolation
| levels, it wasn't meant to be a feature-to-feature
| comparison.
|
| Perhaps I didn't make it prominent enough, but at some point
| I say that many SQL databases have key-value stores at their
| core, and implement a SQL layer on top (e.g. https://www.cock
| roachlabs.com/docs/v22.1/architecture/overvi...).
|
| Basically SQL can be a feature added later to a solid KV
| store as a base.
| Onavo wrote:
| Congrats on reinventing the data lake? This is actually how most
| of the newer generations of "cloud native" databases work, where
| they separate compute and storage. The key is that they have a
| more sophisticated caching layer so that the latency cost of a
| query can be amortized across requests.
| mbrt wrote:
| It's my understanding that the newer generation of data lakes
| still make use of a tiny, strongly consistent metadata database
| to keep track of what is where. This is orders of magnitudes
| smaller than what you'd have by putting everything in the same
| database, but it's still there. This is also the case in newer
| data streaming platforms (e.g.
| https://www.warpstream.com/blog/kafka-is-dead-long-live-
| kafk...).
|
| I'm curious to hear if you have examples of any database using
| _only_ object storage as a backend, because back when I
| started, I couldn 't fin any.
| Onavo wrote:
| Love your article by the way. Not an expert but off the top
| of my head:
|
| https://docs.datomic.com/operation/architecture.html
|
| (However they cheat with dynamo lol)
|
| There's also some listed here
|
| https://davidgomes.com/separation-of-storage-and-compute-
| and...
| mbrt wrote:
| OK, thanks for the reference. Yeah, so indeed separating
| storage and compute is nothing new. Definitely not claiming
| I invented that :)
|
| And as you mention, Datomic uses DynamoDB as well (so, not
| a pure s3 solution). What I'm proposing is to only use
| object storage for everything, pay the price in latency,
| but don't give up on throughput, cost and consistency. The
| differentiator is that this comes with strict
| serializability guarantees, so this is not an eventually
| consistent system
| (https://jepsen.io/consistency/models/strong-serializable).
|
| No matter how sophisticated the caching is, if you want to
| retain strict serializability, writes must be confirmed by
| s3 and reads must validate in s3 before returning, which
| puts a lower bound on latency.
|
| I focused a lot on throughput, which is the one we can
| really optimize.
|
| Hopefully that's clear from the blog, though.
| Onavo wrote:
| Have you seen
| https://news.ycombinator.com/item?id=42174204
| mbrt wrote:
| I just saw it! I asked a question
| (https://news.ycombinator.com/item?id=42180611) and it
| seems that durability and consistency are implemented at
| the caching layer.
|
| Basically an in-memory database which uses S3 as cold
| storage. Definitely an interesting approach, but no
| transactions AFAICT.
| eatonphil wrote:
| > I'm curious to hear if you have examples of any database
| using only object storage as a backend, because back when I
| started, I couldn't fin any.
|
| Take a look at Delta Lake
|
| https://notes.eatonphil.com/2024-09-29-build-a-serverless-
| ac...
| victorbjorklund wrote:
| Pretty cool and could be useful for stuff that isnt updated so
| frequently like a CMS.
| ramesh31 wrote:
| so... Delta Lake?
| jitl wrote:
| There is also SlateDB, another work in progress take on this. HN
| link: https://news.ycombinator.com/item?id=41714858
| maxmcd wrote:
| Yeah I think it's very interesting to compare the two. SlateDB
| expects a single writer and fences writes. This means you can
| make some serious savings on S3 costs because you're using S3
| for consistency but you're batching writes.
|
| GlassDB is much more accessible for smaller volume workloads,
| but gets very costly for high volume because of requests to S3
| per-transaction. In-turn the consistency model is easier to
| reason about because the system is entirely stateless.
| social_quotient wrote:
| I found myself thinking about Cloudflare Durable objects new
| SQLite offering.
|
| Nicely detailed here https://simonwillison.net/2024/Oct/13/zero-
| latency-sqlite-st... And
| https://developers.cloudflare.com/durable-objects/best-pract...
| jacobmarble wrote:
| If I had time, I'd like to implement an Iceberg catalog this way.
___________________________________________________________________
(page generated 2024-11-25 23:00 UTC)