[HN Gopher] Redis re-implemented with SQLite
___________________________________________________________________
Redis re-implemented with SQLite
Author : tosh
Score : 186 points
Date : 2024-04-14 12:51 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| akie wrote:
| I would love to have a Redis alternative where I don't have to
| think about whether or not the dataset fits into memory.
| jetbalsa wrote:
| I've used SSDB[0] in the past for some really stupid large
| datasets (20TB)_and it worked really well in production
|
| [0] https://github.com/ideawu/ssdb
| PlutoIsAPlanet wrote:
| Its also worth checking out kvrocks, which is a redis
| interface on top of rockdb that's part of the Apache project,
| and very well maintained.
| welder wrote:
| I switched from SSDB to Kvrocks recently, because SSDB is
| abandoned and the author missing for 3 years now. I used to
| recommend SSDB, but now there's better alternatives
| available:
|
| https://github.com/apache/kvrocks
|
| https://github.com/sabledb-io/sabledb
| akie wrote:
| These are great recommendations, thanks!
| yuppiepuppie wrote:
| Curious, What's the use case?
| qwertox wrote:
| My same thought, because some don't require much memory.
| akie wrote:
| The use case is caching 20 million API responses that
| almost never change, each about 20kb of JSON, for a high
| traffic site.
|
| Yes, I can pay for a 400Gb RAM instance of Redis, but it's
| expensive.
|
| I can also cache it on disk, but then I need to think about
| cache expiration myself.
|
| Or I can use something appropriate like a document
| database, but then I need additional code & additional
| configuration because we otherwise don't need that piece of
| infrastructure in our stack.
|
| It would be a lot easier if I could just store it in Redis
| with the other (more reasonably sized) things that I need
| to cache.
| Nican wrote:
| This looks like a good use case for ScyllaDB with
| Compression and TTL. It is pretty simple to setup a
| single-node instance.
|
| If you rather have something in-process and writes to
| disk, to avoid extra infrastructure, I would also
| recommend RocksDB with Compression and TTL.
| 0cf8612b2e1e wrote:
| Would DiskCache work for you? Runs via SQLite memory or
| persisted file database. Thread safe, has various
| expiration controls, etc.
|
| https://grantjenks.com/docs/diskcache/tutorial.html
| danpat wrote:
| Or shard it - divide your objects up based on some
| criteria (hash the name of the object, use the first N
| digits of the hash to assign to a shard), and distribute
| them across multiple redis instances. Yes, you then need
| to maintain some client code to pick the right redis
| instance to fetch from, but you can now pick the most
| $/memory efficient instance types to run redis, and you
| don't have to worry about introducing disk read latency
| and the edge cases that brings with it.
|
| Edit: looks like redis has some built-in support for data
| sharding when used as a cluster
| (https://redis.io/docs/latest/commands/cluster-shards/) -
| I haven't used that, so not sure how easy it is to apply,
| and exactly what you'd have to change.
| yuliyp wrote:
| Sharding doesn't help here at all. They'd still need the
| same amount of RAM to house all the data in redis.
| reese_john wrote:
| You could try using Amazon S3 Express, a low-latency
| alternative for S3 buckets [0]. I imagine cache
| invalidation would be relatively simple to implement
| using lifecycle policies.
|
| https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-
| exp...
| yuliyp wrote:
| You're trying to get redis to be what it isn't. Use a
| thing that has the properties you want: a document or
| relational database. If you insist on this then running a
| system that allows a ton of swap onto a reasonably fast
| disk _might_ work, but is still gonna perform worse than
| a system that 's designed for concurrently serving
| queries of wildly differing latencies.
| seddonm1 wrote:
| In other abuses of SQLite, I wrote a tool [0] that
| exposes blobs in SQLite via an Amazon S3 API. It doesn't
| do expiry (but that would be easy enough to add if S3
| does it).
|
| We were using it to manage a millions of images for
| machine learning as many tools support S3 and the ability
| to add custom metadata to objects is useful (harder with
| files). It is one SQLite database per bucket but at the
| bucket level it is transactional.
|
| 0: https://github.com/seddonm1/s3ite
| phamilton wrote:
| A few things:
|
| Redis Data Tiering - Redis Enterprise and AWS Elasticache
| for Redis support data tiering (using SSD for 80% of the
| dataset and moving things in and out). On AWS, a
| cache.r6gd.4xlarge with 100GB of memory can handle 500GB
| of data.
|
| Local Files
|
| > I can also cache it on disk, but then I need to think
| about cache expiration myself.
|
| Is the challenge that you need it shared among many
| machines? On a single machine you can put 20 million
| files in a directory hierarchy and let the fs cache keep
| things hot in memory as needed. Or use SQLite which will
| only load the pages needed for each query and also rely
| on the fs cache.
|
| S3 - An interesting solution is one of the SQLite S3
| VFS's. Those will query S3 fairly efficiently for
| specific data in a large dataset.
| kiitos wrote:
| Redis is an in-memory cache by definition. If you don't
| want to cache in-memory, then don't use Redis.
| jitl wrote:
| Redis drops data semi randomly when under memory pressure.
|
| If you use Redis for queue tasks (this is popular in Rails
| and Django/Python web services), that means that during an
| incident where your queue jobs are getting added faster than
| they're removed, you're going to lose jobs if the incident
| goes on long enough.
| byroot wrote:
| That depends on how the `maxmemory-policy` is configured,
| and queue systems based on Redis will tell you not to allow
| eviction. https://github.com/sidekiq/sidekiq/wiki/Using-
| Redis#memory (it even logs a warnings if it detects your
| Redis is misconfigured IIRC).
| kiitos wrote:
| Well, of course! Redis is not (and has never been) a
| database, it's a data structure server, at best described
| as a cache. If jobs are added faster than they're removed,
| this is straight queueing theory 101 -- ideally you'd
| reject jobs at add-time, but otherwise you have to drop
| them.
| gnarbarian wrote:
| Right. I think Redis hitting the disk would be a terrible
| tradeoff compared to making a new backend call. it
| probably wouldn't save you much time and I imagine it
| would lead to very strange and unpredictable behavior on
| the front end or trying to debug latency or data issues
| downstream
| prisenco wrote:
| Since Redis is an in-memory cache, and already doesn't
| guarantee the data, would it make sense to set PRAGMA
| SYNCHRONOUS on nalgeon to OFF to boost performance to
| something closer to standard Redis?
| noncoml wrote:
| Using a hammer like a screwdriver
| GraemeMeyer wrote:
| I think Garnet might be what you're looking for:
| https://www.microsoft.com/en-us/research/project/garnet/
| Nican wrote:
| As usual, there is a spectrum of data safety vs. performance.
| Redis is at the "very fast, but unsafe" side of the scale.
|
| ScyllaDB for me is in the middle of being high performance key-
| value store, but not really supporting transactions.
| FoundationDB is another one that I would consider.
| tyre wrote:
| Depends on the kind of safety you're looking for. Redis is
| entirely safe from concurrency issues because it's single-
| threaded. It supports an append-only file for persistence to
| disk.
| alerighi wrote:
| At that point why using Redis entirely? You can use any DBMS
| you want, either relational or NoSQL. The advantage of Redis is
| that it is a memory cache, if you take out the memory from it,
| just use Postgres or whatever DBMS you are using (I say
| Postgres because it has all the features of Redis).
| dalyons wrote:
| Postgres has nowhere near all the features of redis. Go and
| have a look at the redis command's documentation. They're not
| even really similar at all, once you get past basic GET/SET
| stuff.
| from-nibly wrote:
| Can you name an explicit thing that postgres does not do
| that redis does?
| jitl wrote:
| This is silly, Postgres doesn't speak the Redis wire
| protocol. You will need a large army of connection
| proxies to get a Postgres database to handle the number
| of connections a single Redis shrugs off with no sweat.
|
| Maybe you like this answer more: At the end of the day
| you can embed a bunch of Turing-complete programming
| languages in Postgres, and Postgres can store binary
| blobs, so Postgres can do literally anything. Can it do
| it performantly, and for low cost? Probably not. But if
| you put in enough time and money I'm sure you can re-
| implement Redis on Postgres using BLOB column alone.
|
| Here's a simpler answer: cuckoo filter is available out
| of the box in Redis, 2 seconds of Googling I didn't find
| one for Postgres:
| https://redis.io/docs/latest/develop/data-
| types/probabilisti...
| codetrotter wrote:
| Not sure if this one could be used in order to do what
| you want but maybe?
|
| https://www.postgresql.org/docs/current/bloom.html
|
| Have a look
| pshc wrote:
| I feel like one _could_ implement most Redis commands as
| functions or PL /SQL using native Postgres hstore and json.
| Could be an interesting translation layer.
| jhatemyjob wrote:
| This is a great idea and I am glad it is BSD licensed.
| Unfortunately the execution is somewhat lacking. SQLite is best
| suited for embedded / clientside applications with minimal
| dependencies. The author of this project decided to use Go and
| make it a service.
| nalgeon wrote:
| Did I?
|
| > Both in-process (Go API) and standalone (RESP) servers.
|
| In-process means that the database is "embedded / clientside"
| in your terms.
| jhatemyjob wrote:
| It's a server.
| leetrout wrote:
| > SQLite is best suited for embedded / clientside applications
| with minimal dependencies.
|
| Often repeated and certainly rooted in truth but there was a
| healthy discussion on here the other day[0] where tptacek
| shared a link in a comment[1] to a related blog post about
| getting more scale out of using SQLite serverside.
|
| 0: https://news.ycombinator.com/item?id=39975596
|
| 1: https://kerkour.com/sqlite-for-servers
| merlinran wrote:
| What would you use Redis or substitutions for in embedded/
| clientside applications? Seriously asking.
| nalgeon wrote:
| I'm a big fan of both Redis and SQLite, so I decided to combine
| the two. SQLite is specifically designed for many small
| queries[1], and it's probably as close as relational engines can
| get to Redis, so I think it might be a good fit.
|
| [1]: https://sqlite.org/np1queryprob.html
| sesm wrote:
| What are the project goals? I assume it's a drop-in replacement
| for Redis that is supposed to be better in certain cases? If
| yes, then what cases do you have in mind?
| nalgeon wrote:
| The goal is to have a convenient API to work with common data
| structures, with an SQL backend and all the benefits it
| provides. Such as:
|
| -- Small memory footprint even for large datasets.
|
| -- ACID transactions.
|
| -- SQL interface for introspection and reporting.
| b33j0r wrote:
| I love this, it's the solution that makes sense for 90% of the
| times I have used redis with python.
|
| I've made several versions of this, and to be honest, it ended
| up being so straightforward that I assumed it was a trivial
| solution.
|
| This is pretty well-planned. This is 100% the way to go.
|
| Heh. I took a detour into making my idea of "streams" also
| solve event sourcing in native python; dumb idea, if
| interesting. Mission creep probably killed my effort!
|
| Nice work
| tehbeard wrote:
| > reimplement the good parts of Redis
|
| Seems to be missing streams, hyperloglog and pubsub though, so
| mostly just the kv part of the side protocol with a different
| backend?
| nalgeon wrote:
| Can't fit everything in 1.0, has to start with something. If
| the community is interested in the project, there will be more.
| poidos wrote:
| Wonderful idea and execution!
| kiitos wrote:
| The entire value proposition of Redis is that it operates out of
| memory, and therefore has memory-like performance. (edit: And
| doesn't provide the benefit of, and therefore pay the costs
| related to, ACID-like consistency guarantees.) If you move it to
| disk (edit: Or try to assert ACID-like consistency or
| transactional guarantees) there's little reason to use Redis any
| more.
| qbane wrote:
| SQLite does allow one to keep the entire database in memory:
| https://www.sqlite.org/inmemorydb.html
| j-pb wrote:
| But is still orders of magnitude slower than a hash-map.
| egeozcan wrote:
| You can also create an in-memory sqlite database though.
| PhilipRoman wrote:
| >The entire value proposition of Redis is that it operates out
| of memory
|
| Not familiar with Redis specifically, but I doubt this idea.
| You can run anything on top of a ramdisk (granted, you can save
| a few pointer additions and get rid of some safety checks if
| you know you're working with memory)
| yuliyp wrote:
| Sure you can run things off a ramdisk, but the way you lay
| out data to achieve high performance from disk vs from RAM is
| different (disk assumes that you read pages of data at once,
| and tries to avoid reading extra pages, while RAM assumes
| that you read cache lines of data at once).
| 77pt77 wrote:
| You can tun sqlite in memory by using the filename
|
| :memory:
| larodi wrote:
| Potentially many things like session mgmt, queues, document
| graphs, etc, can be done right with simple facilities like
| tables. Tables represent sets, and set algebra seems very common
| in data representations. Thing is how the sets are combined, i.e.
| related. This' essentially API-to-SQL-in-Redis-clothes. Kudos to
| the author.
| nalgeon wrote:
| Thank you! I also think that the relational model can get you
| pretty far if you don't need to squeeze every last bit of
| performance out of the program. And the added benefit of using
| a battle-tested SQL engine is far fewer storage-related bugs.
| nasretdinov wrote:
| By the way, I noticed you're using SetMaxConnections(1), however
| in WAL mode (which you're using) SQLite does support writes that
| don't block reads, so you might benefit from allowing read
| concurrency (in theory).
| nalgeon wrote:
| Yeah, it's explained in the code[1]
|
| SQLite only allows one writer at a time, so concurrent writes
| will fail with a "database is locked" (SQLITE_BUSY) error.
|
| There are two ways to enforce the single writer rule:
|
| 1. Use a mutex for write operations.
|
| 2. Set the maximum number of DB connections to 1.
|
| Intuitively, the mutex approach seems better, because it does
| not limit the number of concurrent read operations. The
| benchmarks show the following results:
|
| - GET: 2% better rps and 25% better p50 response time with
| mutex
|
| - SET: 2% better rps and 60% worse p50 response time with mutex
|
| Due to the significant p50 response time mutex penalty for SET,
| I've decided to use the max connections approach for now.
|
| [1]:
| https://github.com/nalgeon/redka/blob/main/internal/sqlx/db....
| nasretdinov wrote:
| How about having two pools, one for writes only, and the
| other one for reads? SQLite allows you to open the DB more
| than in one thread per application, so you can have a read
| pool and a write pool with SetMaxConnections(1) for better
| performance. This of course also means that reads should be
| handled separately from writes in the API layer too.
| nalgeon wrote:
| Thought about it, decided to start with simpler and good
| enough option. The goal here is not to beat Redis anyway.
| nasretdinov wrote:
| Well I agree, that's a good starting point. You probably
| won't be able to beat Redis with SQLite anyway :),
| although given that WAL mode allows for concurrent reads
| it might give it a large enough performance boost to
| match Redis in terms of QPS if the concurrency is high
| enough.
| Sytten wrote:
| This is really not true in WAL mode with synchronous NORMAL,
| this was only true with the default journal mode and a lot of
| people are misusing sqlite because of that. You still have
| one writer at a time but you wont get the SQLITE_BUSY error.
|
| You can check the documentation [1], only some rare edge
| cases return this error in WAL. We abuse our sqlite and I
| never saw it happen with a WAL db.
|
| [1] https://www.sqlite.org/wal.html#sometimes_queries_return_
| sql...
| kiitos wrote:
| > The benchmarks show the following results
|
| Where are the benchmarks?
| TheChaplain wrote:
| Hmm, am I the only one who is not worried?
|
| Although I don't really see anything in the license change that
| would prevent me from using it at both home and business, Redis
| seem "complete" functionality wise so using a pre-license-change
| version can't hurt even long-term I think.
| jitl wrote:
| I'm not sure to what degree you want to follow the Redis no
| concurrency "everything serialized on one thread" model.
|
| You can get substantially better performance out of sqlite by
| using the lower level https://github.com/crawshaw/sqlite, turning
| on WAL etc, using a connection per goroutine for reads, and
| sending batches of writes over a buffered channel / queue to a
| dedicated writer thread. That way you can turn off SQLite's built
| in per-connection mutex but still be thread safe since each
| connection is only used on a single thread at a time.
|
| For this use-case you will also probably save a lot of time if
| you use some large arena-style buffers (probably N per conn?) and
| copy incoming parameter bytes from the network request/socket to
| the buffer, or copy straight from sqlite out to the socket,
| instead of allocating and passing around a bunch of individual
| strings. Boxing those strings in interface{} (as done by the high
| level sql stdlib) slows things down even more.
|
| None of this is necessary to get usable perf, even decently good
| perf, just sharing some tips from my experience trying to get
| absolutely maximum write throughput from SQLite in Golang.
| nalgeon wrote:
| Great tips, thank you! The thing is, getting maximum throughput
| is not the goal of the project (at least not at this stage).
| I'm using reasonable SQLite defaults (including WAL), but
| that's it for now.
| xrd wrote:
| Go plus SQLite is producing some terrific projects. I love
| Pocketbase and this looks great as well.
| sitkack wrote:
| https://github.com/pocketbase/pocketbase
| redskyluan wrote:
| I would personally not recommend implementing a Redis protocol on
| top of SQLite, as I've seen too many failed cases like this.
| Users may perceive your product as a drop-in Redis replacement,
| and it may work fine in a PoC, but once it hits production,
| SQLite's performance and scalability will be severely challenged.
|
| It's much better to use RocksDB as the underlying storage engine
| for such a solution. RocksDB can provide the performance and
| scalability required.If you need a distributed solution, I would
| suggest looking at TiKV or FoundationDB. These are both excellent
| distributed storage systems that can handle production workloads
| much better than a SQLite-based approach.
| surfingdino wrote:
| Back when Foursquare made MongoDB famous someone posted a PoC of
| a NoSQL DB implemented in MySQL. It did not seem to have caught
| on, but it did make me think of how much performance is traded
| for helping us not to reinvent SQL every time we need a DB. I
| like experiments like this one, they sometimes lead to new
| projects.
___________________________________________________________________
(page generated 2024-04-14 23:00 UTC)