hngopher.com

       [HN Gopher] Build Your Own Database
       ___________________________________________________________________
        
       Build Your Own Database
        
       Author : nansdotio
       Score  : 323 points
       Date   : 2025-10-21 16:31 UTC (6 hours ago)
        
 (HTM) web link (www.nan.fyi)
 (TXT) w3m dump (www.nan.fyi)
        
       | 4ndrewl wrote:
       | > Databases were made to solve one problem:
       | 
       | >
       | 
       | > "How do we store data persistently and then efficiently look it
       | up later?"
       | 
       | Isn't that two problems?
        
         | dayjaby wrote:
         | Store data persistently so it can be looked up efficiently*
         | sounds like a single problem.
        
           | SirFatty wrote:
           | Definitely two.
        
             | cjbgkagh wrote:
             | It's not persistent if it can't be recovered later
        
               | stvltvs wrote:
               | Puts message in a bottle and tosses into the most
               | convenient black hole.
        
               | BetaDeltaAlpha wrote:
               | Doesn't the black hole compresses the bottle beyond
               | recovery?
        
               | stvltvs wrote:
               | Not necessarily, opinions vary.
               | 
               | https://www.sciencenewstoday.org/do-black-holes-destroy-
               | or-s...
        
             | SahAssar wrote:
             | "Store data persistently" implies "it can be looked up"
             | since if you cannot look it up it is impossible to know if
             | it is stored persistently.
             | 
             | The "efficiently" part can be considered a separate problem
             | though.
        
             | prerok wrote:
             | Well, if you just want to store data, you can use files.
             | Lookup is a bit tedious and inefficient.
             | 
             | So, if we consider that persistent storage is a solved
             | problem, then we can say that the reason for databases was
             | how to look up data efficiently. In fact, that is why they
             | were invented, even if persistent storage is a
             | prerequisite.
        
             | nonethewiser wrote:
             | How about "store data in certain way." That sounds more
             | like 1 problem and encompasses an even larger problem
             | space.
        
         | grokgrok wrote:
         | How do we reconstruct past memory states? That's the
         | fundamental problem.
         | 
         | Efficiency of storage or retrieval, reliability against loss or
         | corruption, security against unwanted disclosure or
         | modification are all common concerns, and the relative values
         | assigned to these features and others motivate database design.
        
           | kiitos wrote:
           | > How do we reconstruct past memory states? That's the
           | fundamental problem.
           | 
           | reconstructing past memory states is rarely, if ever, a
           | requirement that needs to be accommodated in the database
           | layer
        
             | nonethewiser wrote:
             | Can you elaborate? That certainly seems to be what happens
             | in a typical crud app. You have some model for your data
             | which you persist so that it can be loaded later. Perhaps
             | partially at times.
             | 
             | In another context perhaps you're ingesting data to be used
             | in analytics. Which seems to fit the "reconstruct past
             | memory stat" less.
        
         | i_k_k wrote:
         | I always wanted to ship a write-only database. Lightning fast.
        
           | elygre wrote:
           | Back in the 80s a professor at our college got a presentation
           | on the concept of <<write-only memory>> accepted for some
           | symposium.
           | 
           | Good times.
        
             | thomasjudge wrote:
             | Very secure!
        
           | pcdevils wrote:
           | Pretty much how eventstoredb works. Deleting data fully only
           | happens at scavenge which rewrites the data files.
        
             | hxtk wrote:
             | I think it was a joke. It sounds like you read it as
             | append-only, like most LSM tree databases (not rewriting
             | files in the course of write operations), but I think GP
             | meant it as write-only to the exclusion of reads, roughly
             | equivalent to `echo $data > /dev/null`
        
               | datadrivenangel wrote:
               | I've forgotten how to count that low. [0]
               | 
               | 0 - https://www.youtube.com/watch?v=3t6L-FlfeaI
        
           | archerx wrote:
           | That would be useful for logging.
        
             | warkdarrior wrote:
             | If it's write-only, and no reads ever happen, one can write
             | to /dev/null without loss of utility.
        
               | mewpmewp2 wrote:
               | It would be good for before going to sleep then.
        
             | Etheryte wrote:
             | Also useful for backups, so long as you don't need to
             | restore.
        
         | pratik661 wrote:
         | This is analogous to an elevator that's unidirectional
        
           | rzzzt wrote:
           | One that lets people enter. We will figure out exiting later,
           | with exiting on a different floor as a stretch goal.
        
           | theideaofcoffee wrote:
           | Or just a paternoster
        
         | nonethewiser wrote:
         | It's amusing to me that this is really quite a pedantic
         | observation yet it's driving very earnest engagement from
         | hackernews. Myself included. Absolutely nothing in this article
         | is riding on if its 1 or 2 problems - it's an aside at best.
         | Yet I'm still trying to think through if it's 1 or 2. I mean,
         | the "and" is right there - that clearly suggests two. It's
         | almost comical even, to say "Here is one problem: X and Y." Yet
         | in another way it seems like 2 sides of the same coin.
         | 
         | I guess there is a rather fine line between philosophy and
         | pedantry.
         | 
         | Maybe we can think about it from another angle. If they are 2
         | problems databases were designed to solve, then that means this
         | is a problem databases were designed to solve: storing data
         | persistently.
         | 
         | Is that really a problem database were designed to solve? Not
         | really. We had that long before databases. It was already
         | solved. It's a pretty fundamental computer operation. Isn't it
         | fair to say this is one thing? "Storing data so it can be
         | retrieved efficiently."
        
         | gingersnap wrote:
         | You're thinking of regex
        
         | mrighele wrote:
         | It is a single problem that contains two smaller problems, but
         | the actual hard part (a third problem, if you wish) is putting
         | them together. If you limit yourself to solve those two
         | problems independently you won't have a (useful) database.
        
         | didip wrote:
         | Off by 1 error is indeed a hard problem.
        
         | whartung wrote:
         | > Isn't that two problems?
         | 
         | No, that would be regexes.
        
         | mamcx wrote:
         | You can decompose in 2 problems, because well is better, but is
         | in fact one. Can be argued that is only this single problem:
         | 
         | How, in ACID way, store data that will be efficiently look it
         | up later by a unknown number of clients and unknown access
         | patterns, concurrently, without blocking all the participants,
         | in a fast way?
         | 
         | And then add SQL (ouch!)
        
       | cube2222 wrote:
       | I clicked through a couple of the articles in the OP, and I must
       | say, the design and animations are extremely pretty!
       | 
       | Kudos for that!
        
       | 235ylkj wrote:
       | Here's a simple key-value store inspired by D.B. Cooper:
       | ~/bin/cooper-db-set       ===================       #! /bin/bash
       | key="$1"       value="$2"            echo "${key}:${value}" >>
       | /dev/null                 ~/bin/cooper-db-get
       | ===================       #! /bin/bash            key="$1"
       | </dev/null awk -F: -v key="$key" '$1 == key {result = $2} END
       | {print result}'
        
         | MathMonkeyMan wrote:
         | /dev/null is persistent across restarts and cache friendly, so
         | it's got you covered.
        
       | skeptrune wrote:
       | I love the design and examples in this post. Easy to read for
       | sure.
       | 
       | Exercises like this also seem fun in general. It's a real test of
       | how much you know to start anything from scratch.
        
         | kevinqi wrote:
         | my only minor critique is using lorem ipsum examples. It tends
         | to make me want to gloss over instead of reading; I prefer
         | seeing realistic data. other than that, it's a really cool post
        
           | WD-42 wrote:
           | Was going to post the same thing. Lorem Ipsum makes the data
           | too hard to distinguish. I get that due to the dynamic nature
           | of the examples the text needed to be generated, but Latin
           | isn't the best choice IMO.
           | 
           | Otherwise great article, thank you!
        
         | ashleyn wrote:
         | I was tempted to knee-jerk dismiss this as "don't write your
         | own database, don't even use a KV database, just use SQL". And
         | then I remembered the only reason I'd say this is because I
         | went through designing my own DB or using KV databases just to
         | avoid SQL...only to realise i was badly reinventing SQL. It
         | could be worth the lesson.
        
       | FpUser wrote:
       | >Problem. How do we store data persistently and then efficiently
       | look it up later?"
       | 
       | I would say without transactions it is not a database yet from a
       | practical standpoint.
        
         | dangoodmanUT wrote:
         | I think a lot of databases would disagree
        
           | FpUser wrote:
           | You might be on to something here ;)
        
             | alecco wrote:
             | But they are web scale!
        
       | myth_drannon wrote:
       | I also recommend this free online book to build a database
       | https://build-your-own.org/database/
        
         | bionsystem wrote:
         | I remember an article here, maybe a year ago, where somebody
         | showed some database concepts from bash examples (like "write
         | your db in bash"), but I can't find it anywhere, does anybody
         | have it ?
        
       | DiabloD3 wrote:
       | It looks like it got hugged to death already.
        
         | winrid wrote:
         | Needs a faster database
        
       | keybored wrote:
       | Part of the reason why I'm not a "maker" is because my mind gets
       | ahead of me with all the things that I would need to do in order
       | to do things properly. So the article starts out interesting and
       | then gets more and more, well, not exactly stressful but I get a
       | bit weary by it.
       | 
       | Not that I would aspire to implement a general-purpose database.
       | But even smaller tasks can make my mind spin too much.
        
         | browningstreet wrote:
         | I don't disagree with your take in general, but I do think it's
         | different reading about minutiae than being invested in it. If
         | you actually are curing these requirements it's probably quite
         | engaging. If not, the eyes and mind start to gloss over them.
         | 
         | As a different example: I'm moving this week. I've known I'm
         | moving for a while. Thinking about moving -- and all the little
         | things I have to do -- is way more painful than doing them.
         | Thinking about them keeps me up at night, getting through my
         | list today is only fractionally painful.
         | 
         | I'm also leveling up a few aspects of my "way of living" in the
         | midst of all this, and it'd be terribly boring to tell others
         | about it, but when next Monday comes.. it'll be quite sweet
         | indeed.
        
           | keybored wrote:
           | > As a different example: I'm moving this week. I've known
           | I'm moving for a while. Thinking about moving -- and all the
           | little things I have to do -- is way more painful than doing
           | them. Thinking about them keeps me up at night, getting
           | through my list today is only fractionally painful.
           | 
           | this sounds familiar... :)
        
         | nawgz wrote:
         | Have you considered if you have ADHD?
        
       | chrisallick wrote:
       | if author is reading, can you add an rss feed to your site? i
       | want to add to feedly.
        
       | constantcrying wrote:
       | I absolutely love this "first principles" approach of explaining
       | a topic. You can really go through this and at each time
       | understand what problem needs to be solved and what other
       | problems this introduces, until you get at a reasonably
       | satisfying solution.
        
       | exdeejay_ wrote:
       | The first example in the "Sorting in Practice" section appears to
       | be broken. The text makes it seem like the list should be sorted
       | in-memory and then written to disk sorted, but the example un-
       | sorts the list when it's written to disk.
       | 
       | Edit: the flush example (2nd one) in the recap section does the
       | same thing, when the text says that the records are supposed to
       | be written to the file in sorted order.
        
       | 0xb0565e486 wrote:
       | I have spending the last ~4 weeks writing a triple store!
       | 
       | I wish this came out earlier, there are a few insights in there
       | that took me me a while to understand :)
        
       | saxelsen wrote:
       | Nice interactivity, but this is taken straight from the Designing
       | Data-Intensive Applications. Literally all the content here is an
       | interactive version of chapter 3.
       | 
       | Maybe give credit?
        
       | vladpowerman wrote:
       | Great read. I've been modeling developer activity as a time
       | series key value system where each developer is a key and commits
       | are values. Faced the same issues: logs grow fast, indexes get
       | heavy, range queries slow down. How do you decide what to drop
       | when compacting segments? Balancing freshness and retention is
       | tricky.
        
       | orliesaurus wrote:
       | am i the only one who IS a huge fan of this blogpost layout
        
       | jumploops wrote:
       | "LSM trees are the underlying data structure used for [..]
       | DynamoDB, and they have proven to perform really well at scale
       | [..] 80 million requests per second!"
       | 
       | This is a tad bit misleading, as the LSM is used for the node-
       | level storage engine, but doesn't explain how the overall
       | distributed system scales to 80 million rps.
       | 
       | iirc the original Dynamo paper used BerkeleyDB (b-tree or LSM),
       | but the 2012 paper shifted to a fully LSM-based engine.
        
       ___________________________________________________________________
       (page generated 2025-10-21 23:00 UTC)