[HN Gopher] Filedb: Disk-based key-value store inspired by Bitcask
___________________________________________________________________
Filedb: Disk-based key-value store inspired by Bitcask
Author : todsacerdoti
Score : 113 points
Date : 2025-06-14 02:45 UTC (20 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| wallstop wrote:
| This looks interesting. Maybe I'm not in-the-know, but why would
| you offload such important aspects like `sync` to the client
| instead of building in some protocol to ensure that file
| integrity is maintained? With this kind of design choice, it
| seems quite easy to lose data, unless I'm missing something.
| mukesh610 wrote:
| From the README:
|
| A sync process syncs the open disk files once every
| config.syncInterval. Sync also can be done on every request if
| config.alwaysFsync is True.
| im_down_w_otp wrote:
| Bitcask, now there's a blast from the Basho past. It always
| bugged me that no good secondary indexing strategy was built to
| make using Bitcask viable for more use cases. Everyone always
| wanted to use the LevelDB backend just to get at secondary
| indexing features (which also performance scaled inversely
| relative to cluster size, which was it's own problem). But having
| Riak exhibit consistent, high-performance was waaaaaaaay easier
| on Bitcask.
| lsferreira42 wrote:
| This is something that sometimes i play with:
|
| https://github.com/lsferreira42/nadb
|
| It is a disk based KV store with tags for search
| Imustaskforhelp wrote:
| Sorry, maybe I am not in the mood of delving too deep into the
| project(but I starred it! Amazing job I suppose) and I don't want
| to ask AI but rather some experts who are surely lurking HN.
|
| Can you guys please explain this to me like I am 5(or maybe 10)?
| Is this something revolutionary to keep in back of the mind? How
| does it compare to redis? When should I use it, if any. I always
| prefer sqlite, then postgresql if scalability and afterwards I am
| not sure but maybe things like clickhouse. I am also looking more
| into duckdb but maybe not as a primary database, but rather just
| in fun. There are also things like turso and cloudflare d1 (if I
| remember correctly), kinda prefer cloudflare d1 but also like
| turso or sqlite in general. Still, the database space really
| piques my interest.
|
| Thanks in advance for helping this young fellow out!
| packetlost wrote:
| Implementing Bitcask is sorta like a right of passage for
| people interested in DBs/storage engines. You shouldn't use
| this in production. SQLite is most likely more flexible,
| reliable, and ubiquitous for situations where this project
| would be useful.
| Imustaskforhelp wrote:
| Gotcha! Thanks a lot mate!
|
| So can I say that this is just a toy project created by the
| author to learn about DB/storage engines and I should just
| use sqlite right in prod right?
| ezekiel68 wrote:
| I disagree with the other reply indicating something like this
| should not be used in production. For most of the history of
| practical disk IO, it was observed and assumed that disk reads
| would be relatively much faster than disk writes. It turns out
| that this assumption was based on other assumptions, such as
| that most reading and writing would be handled as "random IO"
| where a physical disk head accessing an actual spinning disk
| might need to move around at any given time to read or to
| update some data.
|
| Riak (the inspiration for this project) and other projects came
| out at a time when software engineers were exploring how to
| make disk writes fast and potentially even faster than reads
| for practical applications. Some tradeoffs to achieve this goal
| could be enforcing all writes to be sequential ("log-
| structured" in riak, kafka, and cassandra parlance) and to
| embrace the model of "eventual consistency".
|
| Eventual consistency is similar to how orders are processed at
| a cafe or fast-food restaurant. The cashier takes the order and
| passes it on to the barista or chef - we'll just say "kitchen".
| The kitchen might not know your order at that moment but it's
| right there nearby (equivalent in our case: in a RAM buffer
| ready for disk write). Once the kitchen has finished other
| orders ahead of yours (the sync interval is reached), it makes
| your order and delivers it to the counter (the data gets
| actually written to disk -- "committed" in DB talk).
|
| The key point in this analogy is that the cashier station
| (system front end UI) doesn't wait around until your order gets
| made before taking other orders. It assumes all is well and
| your order will be served by the kitchen "soon enough".
|
| When might these tradeoffs make sense for production systems?
| Answer: not all data is created equal. For example, if your
| system stores a steady stream of GPS coordinates from pakage
| delivery trucks so customers can know when a truck is near
| their house, it doesn't actually matter if one or two of the
| coordinates is not immediately available (or even gets lost).
| The same can go for backend system telemetry, showing CPU or
| RAM utilization. The trend is the main thing and it's not
| actually important in a particular real-time instant whether
| the dashboard chart shows the last 3 readings (since they have
| yet to be finally written to disk). In cases like these, "ACID"
| (traditional db term) guarantees not only are not requried,
| they get in the way of proper system design and implementation.
| b0a04gl wrote:
| used bitcask during undergrad for a systems course project. task
| was to build a minimal key value store with durability and fast
| writes. no frameworks allowed. tried leveldb first but spent too
| much time tuning compaction. switched to bitcask after reading
| the original Riak paper and it just worked.
|
| append only writes meant less complexity. loaded keys into memory
| on startup, mapped offsets, done. didn't need range queries or
| indexes, just fast put/get. wrote a simple merge script to
| compact old segments. performance was solid and startup time
| didn't degrade as data grew.
|
| biggest learning was how bitcask avoided cleverness. no tricks,
| no layered abstractions. it was just clean storage logic with a
| clear mental model. still think about it when touching newer
| engines that try to do too much
| alexpadula wrote:
| Nice little implementation :) you even added a server too. Good
| work, keep it up!
| tempaccount420 wrote:
| Is the author a child? Am I missing something?
| hdjrudni wrote:
| "An undergraduate student at IIT Kharagpur"
| tempaccount420 wrote:
| > Nice little implementation :) you even added a server
| too. Good work, keep it up!
|
| The tone of the grandparent comment made it sound like the
| author is a child, my bad.
| alexpadula wrote:
| Was not my intention to come off any which way. I reviewed
| the code, liked it and wanted to comment :)
___________________________________________________________________
(page generated 2025-06-14 23:00 UTC)