[HN Gopher] Filedb: Disk-based key-value store inspired by Bitcask
       ___________________________________________________________________
        
       Filedb: Disk-based key-value store inspired by Bitcask
        
       Author : todsacerdoti
       Score  : 113 points
       Date   : 2025-06-14 02:45 UTC (20 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | wallstop wrote:
       | This looks interesting. Maybe I'm not in-the-know, but why would
       | you offload such important aspects like `sync` to the client
       | instead of building in some protocol to ensure that file
       | integrity is maintained? With this kind of design choice, it
       | seems quite easy to lose data, unless I'm missing something.
        
         | mukesh610 wrote:
         | From the README:
         | 
         | A sync process syncs the open disk files once every
         | config.syncInterval. Sync also can be done on every request if
         | config.alwaysFsync is True.
        
       | im_down_w_otp wrote:
       | Bitcask, now there's a blast from the Basho past. It always
       | bugged me that no good secondary indexing strategy was built to
       | make using Bitcask viable for more use cases. Everyone always
       | wanted to use the LevelDB backend just to get at secondary
       | indexing features (which also performance scaled inversely
       | relative to cluster size, which was it's own problem). But having
       | Riak exhibit consistent, high-performance was waaaaaaaay easier
       | on Bitcask.
        
       | lsferreira42 wrote:
       | This is something that sometimes i play with:
       | 
       | https://github.com/lsferreira42/nadb
       | 
       | It is a disk based KV store with tags for search
        
       | Imustaskforhelp wrote:
       | Sorry, maybe I am not in the mood of delving too deep into the
       | project(but I starred it! Amazing job I suppose) and I don't want
       | to ask AI but rather some experts who are surely lurking HN.
       | 
       | Can you guys please explain this to me like I am 5(or maybe 10)?
       | Is this something revolutionary to keep in back of the mind? How
       | does it compare to redis? When should I use it, if any. I always
       | prefer sqlite, then postgresql if scalability and afterwards I am
       | not sure but maybe things like clickhouse. I am also looking more
       | into duckdb but maybe not as a primary database, but rather just
       | in fun. There are also things like turso and cloudflare d1 (if I
       | remember correctly), kinda prefer cloudflare d1 but also like
       | turso or sqlite in general. Still, the database space really
       | piques my interest.
       | 
       | Thanks in advance for helping this young fellow out!
        
         | packetlost wrote:
         | Implementing Bitcask is sorta like a right of passage for
         | people interested in DBs/storage engines. You shouldn't use
         | this in production. SQLite is most likely more flexible,
         | reliable, and ubiquitous for situations where this project
         | would be useful.
        
           | Imustaskforhelp wrote:
           | Gotcha! Thanks a lot mate!
           | 
           | So can I say that this is just a toy project created by the
           | author to learn about DB/storage engines and I should just
           | use sqlite right in prod right?
        
         | ezekiel68 wrote:
         | I disagree with the other reply indicating something like this
         | should not be used in production. For most of the history of
         | practical disk IO, it was observed and assumed that disk reads
         | would be relatively much faster than disk writes. It turns out
         | that this assumption was based on other assumptions, such as
         | that most reading and writing would be handled as "random IO"
         | where a physical disk head accessing an actual spinning disk
         | might need to move around at any given time to read or to
         | update some data.
         | 
         | Riak (the inspiration for this project) and other projects came
         | out at a time when software engineers were exploring how to
         | make disk writes fast and potentially even faster than reads
         | for practical applications. Some tradeoffs to achieve this goal
         | could be enforcing all writes to be sequential ("log-
         | structured" in riak, kafka, and cassandra parlance) and to
         | embrace the model of "eventual consistency".
         | 
         | Eventual consistency is similar to how orders are processed at
         | a cafe or fast-food restaurant. The cashier takes the order and
         | passes it on to the barista or chef - we'll just say "kitchen".
         | The kitchen might not know your order at that moment but it's
         | right there nearby (equivalent in our case: in a RAM buffer
         | ready for disk write). Once the kitchen has finished other
         | orders ahead of yours (the sync interval is reached), it makes
         | your order and delivers it to the counter (the data gets
         | actually written to disk -- "committed" in DB talk).
         | 
         | The key point in this analogy is that the cashier station
         | (system front end UI) doesn't wait around until your order gets
         | made before taking other orders. It assumes all is well and
         | your order will be served by the kitchen "soon enough".
         | 
         | When might these tradeoffs make sense for production systems?
         | Answer: not all data is created equal. For example, if your
         | system stores a steady stream of GPS coordinates from pakage
         | delivery trucks so customers can know when a truck is near
         | their house, it doesn't actually matter if one or two of the
         | coordinates is not immediately available (or even gets lost).
         | The same can go for backend system telemetry, showing CPU or
         | RAM utilization. The trend is the main thing and it's not
         | actually important in a particular real-time instant whether
         | the dashboard chart shows the last 3 readings (since they have
         | yet to be finally written to disk). In cases like these, "ACID"
         | (traditional db term) guarantees not only are not requried,
         | they get in the way of proper system design and implementation.
        
       | b0a04gl wrote:
       | used bitcask during undergrad for a systems course project. task
       | was to build a minimal key value store with durability and fast
       | writes. no frameworks allowed. tried leveldb first but spent too
       | much time tuning compaction. switched to bitcask after reading
       | the original Riak paper and it just worked.
       | 
       | append only writes meant less complexity. loaded keys into memory
       | on startup, mapped offsets, done. didn't need range queries or
       | indexes, just fast put/get. wrote a simple merge script to
       | compact old segments. performance was solid and startup time
       | didn't degrade as data grew.
       | 
       | biggest learning was how bitcask avoided cleverness. no tricks,
       | no layered abstractions. it was just clean storage logic with a
       | clear mental model. still think about it when touching newer
       | engines that try to do too much
        
       | alexpadula wrote:
       | Nice little implementation :) you even added a server too. Good
       | work, keep it up!
        
         | tempaccount420 wrote:
         | Is the author a child? Am I missing something?
        
           | hdjrudni wrote:
           | "An undergraduate student at IIT Kharagpur"
        
             | tempaccount420 wrote:
             | > Nice little implementation :) you even added a server
             | too. Good work, keep it up!
             | 
             | The tone of the grandparent comment made it sound like the
             | author is a child, my bad.
        
           | alexpadula wrote:
           | Was not my intention to come off any which way. I reviewed
           | the code, liked it and wanted to comment :)
        
       ___________________________________________________________________
       (page generated 2025-06-14 23:00 UTC)