[HN Gopher] Build Your Own Database
___________________________________________________________________
Build Your Own Database
Author : nansdotio
Score : 323 points
Date : 2025-10-21 16:31 UTC (6 hours ago)
(HTM) web link (www.nan.fyi)
(TXT) w3m dump (www.nan.fyi)
| 4ndrewl wrote:
| > Databases were made to solve one problem:
|
| >
|
| > "How do we store data persistently and then efficiently look it
| up later?"
|
| Isn't that two problems?
| dayjaby wrote:
| Store data persistently so it can be looked up efficiently*
| sounds like a single problem.
| SirFatty wrote:
| Definitely two.
| cjbgkagh wrote:
| It's not persistent if it can't be recovered later
| stvltvs wrote:
| Puts message in a bottle and tosses into the most
| convenient black hole.
| BetaDeltaAlpha wrote:
| Doesn't the black hole compresses the bottle beyond
| recovery?
| stvltvs wrote:
| Not necessarily, opinions vary.
|
| https://www.sciencenewstoday.org/do-black-holes-destroy-
| or-s...
| SahAssar wrote:
| "Store data persistently" implies "it can be looked up"
| since if you cannot look it up it is impossible to know if
| it is stored persistently.
|
| The "efficiently" part can be considered a separate problem
| though.
| prerok wrote:
| Well, if you just want to store data, you can use files.
| Lookup is a bit tedious and inefficient.
|
| So, if we consider that persistent storage is a solved
| problem, then we can say that the reason for databases was
| how to look up data efficiently. In fact, that is why they
| were invented, even if persistent storage is a
| prerequisite.
| nonethewiser wrote:
| How about "store data in certain way." That sounds more
| like 1 problem and encompasses an even larger problem
| space.
| grokgrok wrote:
| How do we reconstruct past memory states? That's the
| fundamental problem.
|
| Efficiency of storage or retrieval, reliability against loss or
| corruption, security against unwanted disclosure or
| modification are all common concerns, and the relative values
| assigned to these features and others motivate database design.
| kiitos wrote:
| > How do we reconstruct past memory states? That's the
| fundamental problem.
|
| reconstructing past memory states is rarely, if ever, a
| requirement that needs to be accommodated in the database
| layer
| nonethewiser wrote:
| Can you elaborate? That certainly seems to be what happens
| in a typical crud app. You have some model for your data
| which you persist so that it can be loaded later. Perhaps
| partially at times.
|
| In another context perhaps you're ingesting data to be used
| in analytics. Which seems to fit the "reconstruct past
| memory stat" less.
| i_k_k wrote:
| I always wanted to ship a write-only database. Lightning fast.
| elygre wrote:
| Back in the 80s a professor at our college got a presentation
| on the concept of <<write-only memory>> accepted for some
| symposium.
|
| Good times.
| thomasjudge wrote:
| Very secure!
| pcdevils wrote:
| Pretty much how eventstoredb works. Deleting data fully only
| happens at scavenge which rewrites the data files.
| hxtk wrote:
| I think it was a joke. It sounds like you read it as
| append-only, like most LSM tree databases (not rewriting
| files in the course of write operations), but I think GP
| meant it as write-only to the exclusion of reads, roughly
| equivalent to `echo $data > /dev/null`
| datadrivenangel wrote:
| I've forgotten how to count that low. [0]
|
| 0 - https://www.youtube.com/watch?v=3t6L-FlfeaI
| archerx wrote:
| That would be useful for logging.
| warkdarrior wrote:
| If it's write-only, and no reads ever happen, one can write
| to /dev/null without loss of utility.
| mewpmewp2 wrote:
| It would be good for before going to sleep then.
| Etheryte wrote:
| Also useful for backups, so long as you don't need to
| restore.
| pratik661 wrote:
| This is analogous to an elevator that's unidirectional
| rzzzt wrote:
| One that lets people enter. We will figure out exiting later,
| with exiting on a different floor as a stretch goal.
| theideaofcoffee wrote:
| Or just a paternoster
| nonethewiser wrote:
| It's amusing to me that this is really quite a pedantic
| observation yet it's driving very earnest engagement from
| hackernews. Myself included. Absolutely nothing in this article
| is riding on if its 1 or 2 problems - it's an aside at best.
| Yet I'm still trying to think through if it's 1 or 2. I mean,
| the "and" is right there - that clearly suggests two. It's
| almost comical even, to say "Here is one problem: X and Y." Yet
| in another way it seems like 2 sides of the same coin.
|
| I guess there is a rather fine line between philosophy and
| pedantry.
|
| Maybe we can think about it from another angle. If they are 2
| problems databases were designed to solve, then that means this
| is a problem databases were designed to solve: storing data
| persistently.
|
| Is that really a problem database were designed to solve? Not
| really. We had that long before databases. It was already
| solved. It's a pretty fundamental computer operation. Isn't it
| fair to say this is one thing? "Storing data so it can be
| retrieved efficiently."
| gingersnap wrote:
| You're thinking of regex
| mrighele wrote:
| It is a single problem that contains two smaller problems, but
| the actual hard part (a third problem, if you wish) is putting
| them together. If you limit yourself to solve those two
| problems independently you won't have a (useful) database.
| didip wrote:
| Off by 1 error is indeed a hard problem.
| whartung wrote:
| > Isn't that two problems?
|
| No, that would be regexes.
| mamcx wrote:
| You can decompose in 2 problems, because well is better, but is
| in fact one. Can be argued that is only this single problem:
|
| How, in ACID way, store data that will be efficiently look it
| up later by a unknown number of clients and unknown access
| patterns, concurrently, without blocking all the participants,
| in a fast way?
|
| And then add SQL (ouch!)
| cube2222 wrote:
| I clicked through a couple of the articles in the OP, and I must
| say, the design and animations are extremely pretty!
|
| Kudos for that!
| 235ylkj wrote:
| Here's a simple key-value store inspired by D.B. Cooper:
| ~/bin/cooper-db-set =================== #! /bin/bash
| key="$1" value="$2" echo "${key}:${value}" >>
| /dev/null ~/bin/cooper-db-get
| =================== #! /bin/bash key="$1"
| </dev/null awk -F: -v key="$key" '$1 == key {result = $2} END
| {print result}'
| MathMonkeyMan wrote:
| /dev/null is persistent across restarts and cache friendly, so
| it's got you covered.
| skeptrune wrote:
| I love the design and examples in this post. Easy to read for
| sure.
|
| Exercises like this also seem fun in general. It's a real test of
| how much you know to start anything from scratch.
| kevinqi wrote:
| my only minor critique is using lorem ipsum examples. It tends
| to make me want to gloss over instead of reading; I prefer
| seeing realistic data. other than that, it's a really cool post
| WD-42 wrote:
| Was going to post the same thing. Lorem Ipsum makes the data
| too hard to distinguish. I get that due to the dynamic nature
| of the examples the text needed to be generated, but Latin
| isn't the best choice IMO.
|
| Otherwise great article, thank you!
| ashleyn wrote:
| I was tempted to knee-jerk dismiss this as "don't write your
| own database, don't even use a KV database, just use SQL". And
| then I remembered the only reason I'd say this is because I
| went through designing my own DB or using KV databases just to
| avoid SQL...only to realise i was badly reinventing SQL. It
| could be worth the lesson.
| FpUser wrote:
| >Problem. How do we store data persistently and then efficiently
| look it up later?"
|
| I would say without transactions it is not a database yet from a
| practical standpoint.
| dangoodmanUT wrote:
| I think a lot of databases would disagree
| FpUser wrote:
| You might be on to something here ;)
| alecco wrote:
| But they are web scale!
| myth_drannon wrote:
| I also recommend this free online book to build a database
| https://build-your-own.org/database/
| bionsystem wrote:
| I remember an article here, maybe a year ago, where somebody
| showed some database concepts from bash examples (like "write
| your db in bash"), but I can't find it anywhere, does anybody
| have it ?
| DiabloD3 wrote:
| It looks like it got hugged to death already.
| winrid wrote:
| Needs a faster database
| keybored wrote:
| Part of the reason why I'm not a "maker" is because my mind gets
| ahead of me with all the things that I would need to do in order
| to do things properly. So the article starts out interesting and
| then gets more and more, well, not exactly stressful but I get a
| bit weary by it.
|
| Not that I would aspire to implement a general-purpose database.
| But even smaller tasks can make my mind spin too much.
| browningstreet wrote:
| I don't disagree with your take in general, but I do think it's
| different reading about minutiae than being invested in it. If
| you actually are curing these requirements it's probably quite
| engaging. If not, the eyes and mind start to gloss over them.
|
| As a different example: I'm moving this week. I've known I'm
| moving for a while. Thinking about moving -- and all the little
| things I have to do -- is way more painful than doing them.
| Thinking about them keeps me up at night, getting through my
| list today is only fractionally painful.
|
| I'm also leveling up a few aspects of my "way of living" in the
| midst of all this, and it'd be terribly boring to tell others
| about it, but when next Monday comes.. it'll be quite sweet
| indeed.
| keybored wrote:
| > As a different example: I'm moving this week. I've known
| I'm moving for a while. Thinking about moving -- and all the
| little things I have to do -- is way more painful than doing
| them. Thinking about them keeps me up at night, getting
| through my list today is only fractionally painful.
|
| this sounds familiar... :)
| nawgz wrote:
| Have you considered if you have ADHD?
| chrisallick wrote:
| if author is reading, can you add an rss feed to your site? i
| want to add to feedly.
| constantcrying wrote:
| I absolutely love this "first principles" approach of explaining
| a topic. You can really go through this and at each time
| understand what problem needs to be solved and what other
| problems this introduces, until you get at a reasonably
| satisfying solution.
| exdeejay_ wrote:
| The first example in the "Sorting in Practice" section appears to
| be broken. The text makes it seem like the list should be sorted
| in-memory and then written to disk sorted, but the example un-
| sorts the list when it's written to disk.
|
| Edit: the flush example (2nd one) in the recap section does the
| same thing, when the text says that the records are supposed to
| be written to the file in sorted order.
| 0xb0565e486 wrote:
| I have spending the last ~4 weeks writing a triple store!
|
| I wish this came out earlier, there are a few insights in there
| that took me me a while to understand :)
| saxelsen wrote:
| Nice interactivity, but this is taken straight from the Designing
| Data-Intensive Applications. Literally all the content here is an
| interactive version of chapter 3.
|
| Maybe give credit?
| vladpowerman wrote:
| Great read. I've been modeling developer activity as a time
| series key value system where each developer is a key and commits
| are values. Faced the same issues: logs grow fast, indexes get
| heavy, range queries slow down. How do you decide what to drop
| when compacting segments? Balancing freshness and retention is
| tricky.
| orliesaurus wrote:
| am i the only one who IS a huge fan of this blogpost layout
| jumploops wrote:
| "LSM trees are the underlying data structure used for [..]
| DynamoDB, and they have proven to perform really well at scale
| [..] 80 million requests per second!"
|
| This is a tad bit misleading, as the LSM is used for the node-
| level storage engine, but doesn't explain how the overall
| distributed system scales to 80 million rps.
|
| iirc the original Dynamo paper used BerkeleyDB (b-tree or LSM),
| but the 2012 paper shifted to a fully LSM-based engine.
___________________________________________________________________
(page generated 2025-10-21 23:00 UTC)