hngopher.com

       [HN Gopher] Build a Database in 3000 Lines with 0 Dependencies
       ___________________________________________________________________
        
       Build a Database in 3000 Lines with 0 Dependencies
        
       Author : not_a_boat
       Score  : 277 points
       Date   : 2025-01-16 13:59 UTC (3 days ago)
        
 (HTM) web link (build-your-own.org)
 (TXT) w3m dump (build-your-own.org)
        
       | sandeep1998 wrote:
       | Thank you
        
       | airstrike wrote:
       | I see what you did there...
       | https://news.ycombinator.com/item?id=42711727
        
         | phoronixrly wrote:
         | Please tell me they did that on purpose as a response to this
         | post. Top-tier banter!
        
       | andai wrote:
       | I looked into this at one point, I was typing out entire
       | codebases for didactic purposes: SQLite 3 was 120,000 lines of
       | code, but SQLite 2 was 12,000.
       | 
       | So for a bit more effort you get a battle tested real world
       | thing!
        
         | esmy wrote:
         | Wait you took a repo and started typing it into the IDE? Could
         | you please expand on what benefits you noticed and how it
         | affected your understanding of the language? It sounds like a
         | fascinating way to force attention to the code simply reading
         | it wouldn't.
        
           | andai wrote:
           | Yeah I just open two panes in Sublime Text, with the source
           | on the right and then I type it out verbatim on the right.
           | 
           | I make an effort to keep the line numbers synced. Sometimes I
           | skip long repetitive blocks or comments. But I do type out
           | like 80% of the actual characters in the file.
           | 
           | It's about 500 lines per hour fot me, so I can estimate
           | reasonably well how long it'll take.
           | 
           | It's not necessarily an efficient thing to do -- you'd get
           | way more bang for your buck just poking around, asking
           | questions, trying to make small changes. But for reasonably
           | small projects, you can type it out in a few hours, or a day
           | or two. Then you've "round-tripped" every single token
           | through your brain (though sadly not with a meaningful amount
           | of conscious reflection) -- unless you pause and ask
           | questions along the way.
           | 
           | See also my other comment above.
        
             | sitkack wrote:
             | Instead of a book club, have a code typing club, DuoTypo
             | 
             | It would be funny to type it until it builds, and then type
             | it until the tests pass.
        
             | chickenzzzzu wrote:
             | Not to offend you, and you've already pointed out the
             | better way to do it, but I don't think there is too much to
             | gain from this approach. When I was learning Vulkan for
             | example, the only thing this helped me learn was which
             | functions they were calling from the API. Their variable
             | names and ifdefs and wrapper functions were completely
             | useless to me. I was able to get their 5000 lines down to
             | just 1000-- and that was for a single untextured cube with
             | direct memory management and simple surface handling.
             | Imagine if it had been more complex? 20,000 lines of typing
             | for little reason. My neck aches thinking about it :)
        
         | frankie_t wrote:
         | can you elaborate on typing out for didactic purposes, please?
        
           | t-3 wrote:
           | It's a new iteration on the ancient form of learning by
           | copying. I've only ever seen people copy stuff when writing
           | by hand when wanting to memorize something though, I imagine
           | with a keyboard the memory-enhancement effect of writing by
           | hand is lost, but it's probably more effective than just
           | reading alone.
        
         | ianmcgowan wrote:
         | Really puts the auto- in didact! Very curious to hear how this
         | worked for you; it's almost directly the opposite of the
         | copilot approach.
         | 
         | I learned assembler by typing in listings from magazines and
         | hand dis-assembling and debugging on paper. Your approach seems
         | similar in spirit, but who has the times these days?
        
           | andai wrote:
           | I learned this from Zed Shaw's Learn X The Hard Way books. He
           | says this approach is mainstream in other disciplines, like
           | music, languages, or martial arts.
           | 
           | I also heard the philosopher Ken Wilber spent a few years (in
           | what kids today call Monk Mode) writing out great books by
           | hand.
           | 
           | The main effect I noticed is that I rapidly gain muscle
           | memory in a new programming language, library or codebase.
           | 
           | The other effect is that I'm forced to round-trip every token
           | through my brain, which is very helpful as my eyes tend to
           | glaze over -- often I'll be looking right at an obvious bug
           | without seeing it.
        
           | norir wrote:
           | I program in neovim with no plugins, no autocomplete and no
           | syntax highlighting. I type everything myself (though I will
           | use copy and paste from time to time). There is a discipline
           | to it that I find very beneficial. As a language designer, it
           | also makes me think very carefully about the syntactic burden
           | of languages that I design. It keeps my languages tight. One
           | of the nice things about typing all of my own code without
           | suggestions is that it eliminates many distractions. I may
           | get some things wrong from time to time, but then I only have
           | myself to blame. And I never waste time messing around with
           | broken plugin configs or irritating syntax highlighting nits.
           | 
           | It's not for everyone but I love it.
        
         | cryptonector wrote:
         | The proprietary test suite for SQLite3 is much much larger
         | still. The battle-testedness comes in great part from that.
        
           | postalrat wrote:
           | Is that where the 10x more lines came from? Writing more
           | "testable" code?
        
             | cryptonector wrote:
             | Oh no, SQLite3 is a lot more featureful than SQLite2. The
             | proprietary test suite is what makes SQLite3 so solid.
        
               | CSSer wrote:
               | This makes me wonder. Is anyone practicing TDD with
               | genAI/LLMs? If the true value is in the tests, might as
               | well write those and have the AI slop be the codebase
               | itself. TDD is often criticized for being slow. I'd
               | seriously like to compare one vs the other today. I've
               | also heard people find it challenging to get it to write
               | good tests.
        
         | tonyedgecombe wrote:
         | >I was typing out entire codebases for didactic purposes
         | 
         | I've read about an author who did this (I can't remember their
         | name right now), writing down the works of another author they
         | wanted to learn from.
        
           | bitdivision wrote:
           | Hunter S. Thompson copied The great Gatsby on his typewriter:
           | https://news.ycombinator.com/item?id=24696790
        
         | 867-5309 wrote:
         | one does not simply type out 120 000 lines of code..
        
           | tap-snap-or-nap wrote:
           | Number of lines does not matter anymore.
        
       | amelius wrote:
       | Waiting for a post where he writes a distributed database.
        
         | MarcelOlsz wrote:
         | You're going to feel awful silly when he does it in like, 3
         | lines.
        
       | cryptonector wrote:
       | Re: copy-on-write (CoW) B-tree vs append-only log + non-CoW
       | B-tree, why not both?
       | 
       | I.e., just write one file (or several) as a B-tree + a log,
       | appending to log, and once in a while merging log entries into
       | the B-tree in a CoW manner. Essentially that's what ZFS does,
       | except it's optional when it really shouldn't be. The whole point
       | of the log is to amortize the cost of the copy-on-write B-tree
       | updates because CoW B-tree updates incur a great deal of write
       | magnification due to having to write all new interior blocks for
       | all leaf node writes. If you wait to accumulate a bunch of
       | transactions then when you finally merge them into the tree you
       | will be able to share many new interior nodes for all those leaf
       | nodes. So just make the log a first-class part of the database.
       | 
       | Also, the log can include small indices of log entries since the
       | last B-tree merge, and then you can accumulate even more
       | transactions in the log before having to merge into the B-tree,
       | thus further amortizing all that write magnification. This
       | approaches an LSM, but with a B-tree at the oldest layer.
        
       | qianli_cs wrote:
       | I've read a similar series from Phil back in 2020: "Writing a SQL
       | database from scratch in Go"
       | https://notes.eatonphil.com/database-basics.html
       | 
       | The code is available on GitHub:
       | https://github.com/eatonphil/gosql (it's specifically a
       | PostgreSQL implementation in Go).
       | 
       | It's cool to build a database in 3000 lines, but for a real
       | production-ready database you'll need testing. Would love to see
       | some coverage on correctness and reliability tests. For example,
       | SQLite has about 590 times more test code than the library
       | itself. (https://www.sqlite.org/testing.html)
        
       | swyx wrote:
       | related: https://dx.tips/oops-database
        
       | vrnvu wrote:
       | A similar resource I recently discovered and it's not that
       | popular: https://github.com/pingcap/talent-
       | plan/tree/master/courses/r...
       | 
       | Writing a Bitcask(KV wal) like db in Rust. Really cool and simple
       | ideas. The white paper is like 5 pages.
        
       | _zoltan_ wrote:
       | I've started learning Velox last year, and it's a staggering
       | amount of code. Sure, it has a ton of dependency because it wants
       | to support so many things, but I feel like the core itself is
       | also very complex.
       | 
       | I'm not sold on complexity being a necessity in software
       | engineering, as I'm sure a lot of you also aren't. Yet we see a
       | lot of behemoth projects.
        
       | wwarren wrote:
       | Man I got the first edition of this book and it was so bad.
       | Hopefully this is better...
        
       | anacrolix wrote:
       | I recently wrote a KV disk change in Rust that uses the latest
       | syscalls: https://github.com/anacrolix/possum
        
       ___________________________________________________________________
       (page generated 2025-01-19 23:00 UTC)