hngopher.com

       [HN Gopher] Redis re-implemented with SQLite
       ___________________________________________________________________
        
       Redis re-implemented with SQLite
        
       Author : tosh
       Score  : 186 points
       Date   : 2024-04-14 12:51 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | akie wrote:
       | I would love to have a Redis alternative where I don't have to
       | think about whether or not the dataset fits into memory.
        
         | jetbalsa wrote:
         | I've used SSDB[0] in the past for some really stupid large
         | datasets (20TB)_and it worked really well in production
         | 
         | [0] https://github.com/ideawu/ssdb
        
           | PlutoIsAPlanet wrote:
           | Its also worth checking out kvrocks, which is a redis
           | interface on top of rockdb that's part of the Apache project,
           | and very well maintained.
        
           | welder wrote:
           | I switched from SSDB to Kvrocks recently, because SSDB is
           | abandoned and the author missing for 3 years now. I used to
           | recommend SSDB, but now there's better alternatives
           | available:
           | 
           | https://github.com/apache/kvrocks
           | 
           | https://github.com/sabledb-io/sabledb
        
             | akie wrote:
             | These are great recommendations, thanks!
        
         | yuppiepuppie wrote:
         | Curious, What's the use case?
        
           | qwertox wrote:
           | My same thought, because some don't require much memory.
        
             | akie wrote:
             | The use case is caching 20 million API responses that
             | almost never change, each about 20kb of JSON, for a high
             | traffic site.
             | 
             | Yes, I can pay for a 400Gb RAM instance of Redis, but it's
             | expensive.
             | 
             | I can also cache it on disk, but then I need to think about
             | cache expiration myself.
             | 
             | Or I can use something appropriate like a document
             | database, but then I need additional code & additional
             | configuration because we otherwise don't need that piece of
             | infrastructure in our stack.
             | 
             | It would be a lot easier if I could just store it in Redis
             | with the other (more reasonably sized) things that I need
             | to cache.
        
               | Nican wrote:
               | This looks like a good use case for ScyllaDB with
               | Compression and TTL. It is pretty simple to setup a
               | single-node instance.
               | 
               | If you rather have something in-process and writes to
               | disk, to avoid extra infrastructure, I would also
               | recommend RocksDB with Compression and TTL.
        
               | 0cf8612b2e1e wrote:
               | Would DiskCache work for you? Runs via SQLite memory or
               | persisted file database. Thread safe, has various
               | expiration controls, etc.
               | 
               | https://grantjenks.com/docs/diskcache/tutorial.html
        
               | danpat wrote:
               | Or shard it - divide your objects up based on some
               | criteria (hash the name of the object, use the first N
               | digits of the hash to assign to a shard), and distribute
               | them across multiple redis instances. Yes, you then need
               | to maintain some client code to pick the right redis
               | instance to fetch from, but you can now pick the most
               | $/memory efficient instance types to run redis, and you
               | don't have to worry about introducing disk read latency
               | and the edge cases that brings with it.
               | 
               | Edit: looks like redis has some built-in support for data
               | sharding when used as a cluster
               | (https://redis.io/docs/latest/commands/cluster-shards/) -
               | I haven't used that, so not sure how easy it is to apply,
               | and exactly what you'd have to change.
        
               | yuliyp wrote:
               | Sharding doesn't help here at all. They'd still need the
               | same amount of RAM to house all the data in redis.
        
               | reese_john wrote:
               | You could try using Amazon S3 Express, a low-latency
               | alternative for S3 buckets [0]. I imagine cache
               | invalidation would be relatively simple to implement
               | using lifecycle policies.
               | 
               | https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-
               | exp...
        
               | yuliyp wrote:
               | You're trying to get redis to be what it isn't. Use a
               | thing that has the properties you want: a document or
               | relational database. If you insist on this then running a
               | system that allows a ton of swap onto a reasonably fast
               | disk _might_ work, but is still gonna perform worse than
               | a system that 's designed for concurrently serving
               | queries of wildly differing latencies.
        
               | seddonm1 wrote:
               | In other abuses of SQLite, I wrote a tool [0] that
               | exposes blobs in SQLite via an Amazon S3 API. It doesn't
               | do expiry (but that would be easy enough to add if S3
               | does it).
               | 
               | We were using it to manage a millions of images for
               | machine learning as many tools support S3 and the ability
               | to add custom metadata to objects is useful (harder with
               | files). It is one SQLite database per bucket but at the
               | bucket level it is transactional.
               | 
               | 0: https://github.com/seddonm1/s3ite
        
               | phamilton wrote:
               | A few things:
               | 
               | Redis Data Tiering - Redis Enterprise and AWS Elasticache
               | for Redis support data tiering (using SSD for 80% of the
               | dataset and moving things in and out). On AWS, a
               | cache.r6gd.4xlarge with 100GB of memory can handle 500GB
               | of data.
               | 
               | Local Files
               | 
               | > I can also cache it on disk, but then I need to think
               | about cache expiration myself.
               | 
               | Is the challenge that you need it shared among many
               | machines? On a single machine you can put 20 million
               | files in a directory hierarchy and let the fs cache keep
               | things hot in memory as needed. Or use SQLite which will
               | only load the pages needed for each query and also rely
               | on the fs cache.
               | 
               | S3 - An interesting solution is one of the SQLite S3
               | VFS's. Those will query S3 fairly efficiently for
               | specific data in a large dataset.
        
               | kiitos wrote:
               | Redis is an in-memory cache by definition. If you don't
               | want to cache in-memory, then don't use Redis.
        
           | jitl wrote:
           | Redis drops data semi randomly when under memory pressure.
           | 
           | If you use Redis for queue tasks (this is popular in Rails
           | and Django/Python web services), that means that during an
           | incident where your queue jobs are getting added faster than
           | they're removed, you're going to lose jobs if the incident
           | goes on long enough.
        
             | byroot wrote:
             | That depends on how the `maxmemory-policy` is configured,
             | and queue systems based on Redis will tell you not to allow
             | eviction. https://github.com/sidekiq/sidekiq/wiki/Using-
             | Redis#memory (it even logs a warnings if it detects your
             | Redis is misconfigured IIRC).
        
             | kiitos wrote:
             | Well, of course! Redis is not (and has never been) a
             | database, it's a data structure server, at best described
             | as a cache. If jobs are added faster than they're removed,
             | this is straight queueing theory 101 -- ideally you'd
             | reject jobs at add-time, but otherwise you have to drop
             | them.
        
               | gnarbarian wrote:
               | Right. I think Redis hitting the disk would be a terrible
               | tradeoff compared to making a new backend call. it
               | probably wouldn't save you much time and I imagine it
               | would lead to very strange and unpredictable behavior on
               | the front end or trying to debug latency or data issues
               | downstream
        
             | prisenco wrote:
             | Since Redis is an in-memory cache, and already doesn't
             | guarantee the data, would it make sense to set PRAGMA
             | SYNCHRONOUS on nalgeon to OFF to boost performance to
             | something closer to standard Redis?
        
           | noncoml wrote:
           | Using a hammer like a screwdriver
        
         | GraemeMeyer wrote:
         | I think Garnet might be what you're looking for:
         | https://www.microsoft.com/en-us/research/project/garnet/
        
         | Nican wrote:
         | As usual, there is a spectrum of data safety vs. performance.
         | Redis is at the "very fast, but unsafe" side of the scale.
         | 
         | ScyllaDB for me is in the middle of being high performance key-
         | value store, but not really supporting transactions.
         | FoundationDB is another one that I would consider.
        
           | tyre wrote:
           | Depends on the kind of safety you're looking for. Redis is
           | entirely safe from concurrency issues because it's single-
           | threaded. It supports an append-only file for persistence to
           | disk.
        
         | alerighi wrote:
         | At that point why using Redis entirely? You can use any DBMS
         | you want, either relational or NoSQL. The advantage of Redis is
         | that it is a memory cache, if you take out the memory from it,
         | just use Postgres or whatever DBMS you are using (I say
         | Postgres because it has all the features of Redis).
        
           | dalyons wrote:
           | Postgres has nowhere near all the features of redis. Go and
           | have a look at the redis command's documentation. They're not
           | even really similar at all, once you get past basic GET/SET
           | stuff.
        
             | from-nibly wrote:
             | Can you name an explicit thing that postgres does not do
             | that redis does?
        
               | jitl wrote:
               | This is silly, Postgres doesn't speak the Redis wire
               | protocol. You will need a large army of connection
               | proxies to get a Postgres database to handle the number
               | of connections a single Redis shrugs off with no sweat.
               | 
               | Maybe you like this answer more: At the end of the day
               | you can embed a bunch of Turing-complete programming
               | languages in Postgres, and Postgres can store binary
               | blobs, so Postgres can do literally anything. Can it do
               | it performantly, and for low cost? Probably not. But if
               | you put in enough time and money I'm sure you can re-
               | implement Redis on Postgres using BLOB column alone.
               | 
               | Here's a simpler answer: cuckoo filter is available out
               | of the box in Redis, 2 seconds of Googling I didn't find
               | one for Postgres:
               | https://redis.io/docs/latest/develop/data-
               | types/probabilisti...
        
               | codetrotter wrote:
               | Not sure if this one could be used in order to do what
               | you want but maybe?
               | 
               | https://www.postgresql.org/docs/current/bloom.html
               | 
               | Have a look
        
             | pshc wrote:
             | I feel like one _could_ implement most Redis commands as
             | functions or PL /SQL using native Postgres hstore and json.
             | Could be an interesting translation layer.
        
       | jhatemyjob wrote:
       | This is a great idea and I am glad it is BSD licensed.
       | Unfortunately the execution is somewhat lacking. SQLite is best
       | suited for embedded / clientside applications with minimal
       | dependencies. The author of this project decided to use Go and
       | make it a service.
        
         | nalgeon wrote:
         | Did I?
         | 
         | > Both in-process (Go API) and standalone (RESP) servers.
         | 
         | In-process means that the database is "embedded / clientside"
         | in your terms.
        
           | jhatemyjob wrote:
           | It's a server.
        
         | leetrout wrote:
         | > SQLite is best suited for embedded / clientside applications
         | with minimal dependencies.
         | 
         | Often repeated and certainly rooted in truth but there was a
         | healthy discussion on here the other day[0] where tptacek
         | shared a link in a comment[1] to a related blog post about
         | getting more scale out of using SQLite serverside.
         | 
         | 0: https://news.ycombinator.com/item?id=39975596
         | 
         | 1: https://kerkour.com/sqlite-for-servers
        
         | merlinran wrote:
         | What would you use Redis or substitutions for in embedded/
         | clientside applications? Seriously asking.
        
       | nalgeon wrote:
       | I'm a big fan of both Redis and SQLite, so I decided to combine
       | the two. SQLite is specifically designed for many small
       | queries[1], and it's probably as close as relational engines can
       | get to Redis, so I think it might be a good fit.
       | 
       | [1]: https://sqlite.org/np1queryprob.html
        
         | sesm wrote:
         | What are the project goals? I assume it's a drop-in replacement
         | for Redis that is supposed to be better in certain cases? If
         | yes, then what cases do you have in mind?
        
           | nalgeon wrote:
           | The goal is to have a convenient API to work with common data
           | structures, with an SQL backend and all the benefits it
           | provides. Such as:
           | 
           | -- Small memory footprint even for large datasets.
           | 
           | -- ACID transactions.
           | 
           | -- SQL interface for introspection and reporting.
        
         | b33j0r wrote:
         | I love this, it's the solution that makes sense for 90% of the
         | times I have used redis with python.
         | 
         | I've made several versions of this, and to be honest, it ended
         | up being so straightforward that I assumed it was a trivial
         | solution.
         | 
         | This is pretty well-planned. This is 100% the way to go.
         | 
         | Heh. I took a detour into making my idea of "streams" also
         | solve event sourcing in native python; dumb idea, if
         | interesting. Mission creep probably killed my effort!
         | 
         | Nice work
        
       | tehbeard wrote:
       | > reimplement the good parts of Redis
       | 
       | Seems to be missing streams, hyperloglog and pubsub though, so
       | mostly just the kv part of the side protocol with a different
       | backend?
        
         | nalgeon wrote:
         | Can't fit everything in 1.0, has to start with something. If
         | the community is interested in the project, there will be more.
        
       | poidos wrote:
       | Wonderful idea and execution!
        
       | kiitos wrote:
       | The entire value proposition of Redis is that it operates out of
       | memory, and therefore has memory-like performance. (edit: And
       | doesn't provide the benefit of, and therefore pay the costs
       | related to, ACID-like consistency guarantees.) If you move it to
       | disk (edit: Or try to assert ACID-like consistency or
       | transactional guarantees) there's little reason to use Redis any
       | more.
        
         | qbane wrote:
         | SQLite does allow one to keep the entire database in memory:
         | https://www.sqlite.org/inmemorydb.html
        
           | j-pb wrote:
           | But is still orders of magnitude slower than a hash-map.
        
         | egeozcan wrote:
         | You can also create an in-memory sqlite database though.
        
         | PhilipRoman wrote:
         | >The entire value proposition of Redis is that it operates out
         | of memory
         | 
         | Not familiar with Redis specifically, but I doubt this idea.
         | You can run anything on top of a ramdisk (granted, you can save
         | a few pointer additions and get rid of some safety checks if
         | you know you're working with memory)
        
           | yuliyp wrote:
           | Sure you can run things off a ramdisk, but the way you lay
           | out data to achieve high performance from disk vs from RAM is
           | different (disk assumes that you read pages of data at once,
           | and tries to avoid reading extra pages, while RAM assumes
           | that you read cache lines of data at once).
        
         | 77pt77 wrote:
         | You can tun sqlite in memory by using the filename
         | 
         | :memory:
        
       | larodi wrote:
       | Potentially many things like session mgmt, queues, document
       | graphs, etc, can be done right with simple facilities like
       | tables. Tables represent sets, and set algebra seems very common
       | in data representations. Thing is how the sets are combined, i.e.
       | related. This' essentially API-to-SQL-in-Redis-clothes. Kudos to
       | the author.
        
         | nalgeon wrote:
         | Thank you! I also think that the relational model can get you
         | pretty far if you don't need to squeeze every last bit of
         | performance out of the program. And the added benefit of using
         | a battle-tested SQL engine is far fewer storage-related bugs.
        
       | nasretdinov wrote:
       | By the way, I noticed you're using SetMaxConnections(1), however
       | in WAL mode (which you're using) SQLite does support writes that
       | don't block reads, so you might benefit from allowing read
       | concurrency (in theory).
        
         | nalgeon wrote:
         | Yeah, it's explained in the code[1]
         | 
         | SQLite only allows one writer at a time, so concurrent writes
         | will fail with a "database is locked" (SQLITE_BUSY) error.
         | 
         | There are two ways to enforce the single writer rule:
         | 
         | 1. Use a mutex for write operations.
         | 
         | 2. Set the maximum number of DB connections to 1.
         | 
         | Intuitively, the mutex approach seems better, because it does
         | not limit the number of concurrent read operations. The
         | benchmarks show the following results:
         | 
         | - GET: 2% better rps and 25% better p50 response time with
         | mutex
         | 
         | - SET: 2% better rps and 60% worse p50 response time with mutex
         | 
         | Due to the significant p50 response time mutex penalty for SET,
         | I've decided to use the max connections approach for now.
         | 
         | [1]:
         | https://github.com/nalgeon/redka/blob/main/internal/sqlx/db....
        
           | nasretdinov wrote:
           | How about having two pools, one for writes only, and the
           | other one for reads? SQLite allows you to open the DB more
           | than in one thread per application, so you can have a read
           | pool and a write pool with SetMaxConnections(1) for better
           | performance. This of course also means that reads should be
           | handled separately from writes in the API layer too.
        
             | nalgeon wrote:
             | Thought about it, decided to start with simpler and good
             | enough option. The goal here is not to beat Redis anyway.
        
               | nasretdinov wrote:
               | Well I agree, that's a good starting point. You probably
               | won't be able to beat Redis with SQLite anyway :),
               | although given that WAL mode allows for concurrent reads
               | it might give it a large enough performance boost to
               | match Redis in terms of QPS if the concurrency is high
               | enough.
        
           | Sytten wrote:
           | This is really not true in WAL mode with synchronous NORMAL,
           | this was only true with the default journal mode and a lot of
           | people are misusing sqlite because of that. You still have
           | one writer at a time but you wont get the SQLITE_BUSY error.
           | 
           | You can check the documentation [1], only some rare edge
           | cases return this error in WAL. We abuse our sqlite and I
           | never saw it happen with a WAL db.
           | 
           | [1] https://www.sqlite.org/wal.html#sometimes_queries_return_
           | sql...
        
           | kiitos wrote:
           | > The benchmarks show the following results
           | 
           | Where are the benchmarks?
        
       | TheChaplain wrote:
       | Hmm, am I the only one who is not worried?
       | 
       | Although I don't really see anything in the license change that
       | would prevent me from using it at both home and business, Redis
       | seem "complete" functionality wise so using a pre-license-change
       | version can't hurt even long-term I think.
        
       | jitl wrote:
       | I'm not sure to what degree you want to follow the Redis no
       | concurrency "everything serialized on one thread" model.
       | 
       | You can get substantially better performance out of sqlite by
       | using the lower level https://github.com/crawshaw/sqlite, turning
       | on WAL etc, using a connection per goroutine for reads, and
       | sending batches of writes over a buffered channel / queue to a
       | dedicated writer thread. That way you can turn off SQLite's built
       | in per-connection mutex but still be thread safe since each
       | connection is only used on a single thread at a time.
       | 
       | For this use-case you will also probably save a lot of time if
       | you use some large arena-style buffers (probably N per conn?) and
       | copy incoming parameter bytes from the network request/socket to
       | the buffer, or copy straight from sqlite out to the socket,
       | instead of allocating and passing around a bunch of individual
       | strings. Boxing those strings in interface{} (as done by the high
       | level sql stdlib) slows things down even more.
       | 
       | None of this is necessary to get usable perf, even decently good
       | perf, just sharing some tips from my experience trying to get
       | absolutely maximum write throughput from SQLite in Golang.
        
         | nalgeon wrote:
         | Great tips, thank you! The thing is, getting maximum throughput
         | is not the goal of the project (at least not at this stage).
         | I'm using reasonable SQLite defaults (including WAL), but
         | that's it for now.
        
       | xrd wrote:
       | Go plus SQLite is producing some terrific projects. I love
       | Pocketbase and this looks great as well.
        
         | sitkack wrote:
         | https://github.com/pocketbase/pocketbase
        
       | redskyluan wrote:
       | I would personally not recommend implementing a Redis protocol on
       | top of SQLite, as I've seen too many failed cases like this.
       | Users may perceive your product as a drop-in Redis replacement,
       | and it may work fine in a PoC, but once it hits production,
       | SQLite's performance and scalability will be severely challenged.
       | 
       | It's much better to use RocksDB as the underlying storage engine
       | for such a solution. RocksDB can provide the performance and
       | scalability required.If you need a distributed solution, I would
       | suggest looking at TiKV or FoundationDB. These are both excellent
       | distributed storage systems that can handle production workloads
       | much better than a SQLite-based approach.
        
       | surfingdino wrote:
       | Back when Foursquare made MongoDB famous someone posted a PoC of
       | a NoSQL DB implemented in MySQL. It did not seem to have caught
       | on, but it did make me think of how much performance is traded
       | for helping us not to reinvent SQL every time we need a DB. I
       | like experiments like this one, they sometimes lead to new
       | projects.
        
       ___________________________________________________________________
       (page generated 2024-04-14 23:00 UTC)