[HN Gopher] Building a highly-available web service without a da...
       ___________________________________________________________________
        
       Building a highly-available web service without a database
        
       Author : tdrhq
       Score  : 236 points
       Date   : 2024-08-10 02:37 UTC (20 hours ago)
        
 (HTM) web link (blog.screenshotbot.io)
 (TXT) w3m dump (blog.screenshotbot.io)
        
       | nephy wrote:
       | We didn't want to build something complicated, so we implemented
       | our own raft consensus layer. Have you considered just using
       | Redis?
        
         | tdrhq wrote:
         | Haha, I totally hear you. But but, we didn't really build the
         | raft consensus layer from scratch. We used an existing robust
         | library for that: https://github.com/baidu/braft
        
           | ramon156 wrote:
           | You completely skipped the question though
        
         | ActorNightly wrote:
         | To throw the question back at you: have you considered that
         | this isn't complicated?
        
           | nephy wrote:
           | No I haven't because it's quite complicated. Databases are
           | very much a solved problem. Unfortunately, this architecture
           | is going to be nigh impossible to hire for and when it goes
           | absolutely sideways recovery will be difficult.
        
             | ahoka wrote:
             | That's the best part, you don't realize when things go
             | sideways.
        
           | echoangle wrote:
           | Compared to installing, configuring and maintaining an
           | installation of Redis, this absolutely is complicated. Do you
           | think this is less complicated than using Redis?
        
         | 1oooqooq wrote:
         | redis and mongo are the type of things i will yak shave to no
         | ends so i don't have to deploy them in production
        
           | nephy wrote:
           | I'm honestly not sure what you are talking about. In my
           | experience, Redis is super easy to run and manage in
           | production.
        
             | ahoka wrote:
             | If you like split brains, yes. :)
        
             | sparrish wrote:
             | I'm with you. We've been using Redis in production for more
             | than a decade and it's one of the easiest distributed DBs
             | we've ever used.
        
         | gunapologist99 wrote:
         | Redis is best as an in-memory cache, not a database. Having
         | used it in production for roughly a decade, I don't trust it's
         | on-disk capabilities (AOF/RDB etc) as either solid or reliable
         | (or even performant) in an emergency scenario, especially with
         | DR or DB migration in mind.
        
       | localfirst wrote:
       | I would use cloudflare R2 but its not globally distributed so its
       | pointless using it on edge
       | 
       | otherwise I get the messaging with edge you the database is the
       | bottleneck
       | 
       | just need a one stop shop to do edge functions + edge db
        
         | tazu wrote:
         | Cloudflare's durable objects seem similar to this article's
         | "objects in RAM", but I think you still have to do some minimal
         | serialization.
        
           | jeremycarter wrote:
           | The Cloudflare durable object is very much the same as a
           | Virtual Actor
           | 
           | https://www.microsoft.com/en-us/research/project/orleans-
           | vir...
        
       | Zak wrote:
       | Decades ago, PG wrote that he didn't use a database for Viaweb,
       | and that it seemed odd for web apps to be frontends to databases
       | when desktop apps were not[0]. HN also doesn't use a database.
       | 
       | That's no longer true, with modern desktop and mobile apps often
       | using a database (usually SQLite) because relational data storage
       | and queries turn out to be pretty useful in a wide range of
       | applications.
       | 
       | [0] https://www.paulgraham.com/vwfaq.html
        
         | never_inline wrote:
         | I think even SQLite itself wasn't as ubiquitous (edit: it
         | didn't exist) when pg write viaweb. If SQLite wasn't there and
         | my options were basically key value stores, I could as well use
         | filesystem in most cases.
         | 
         | Second, querying the RDBMS has been much simplified in past 20
         | years. We have all kind of ORMs and row mappers to reduce the
         | boilerplate.
         | 
         | We also got advanced features like FTS which are useful for
         | desktop and mobile apps.
         | 
         | Today it's a good choice to use RDBMS for desktop apps.
        
           | knallfrosch wrote:
           | > Today it's a good choice to use RDBMS for desktop apps.
           | 
           | Is there an alternative? I haven't seen a "local filesystem
           | is okay as data storage" software in the 21th century.
        
           | zimpenfish wrote:
           | > If SQLite wasn't there and my options were basically key
           | value stores
           | 
           | Well, there were "options" other than KV stores - MySQL
           | launched a month before Viaweb (but flakey for a good long
           | while.) Oracle was definitely around (but probably $$$$.)
           | mSQL was being used on the web and reasonably popular by 1995
           | (cheap! cheerful! not terrible!)
           | 
           | (definitely understand making your own in-memory DB in 1995
           | though)
        
         | endorphine wrote:
         | HN does not use a database?! Can you expand on that? It's very
         | surprising to me.
        
           | exe34 wrote:
           | probably uses the filesystem as the backing store
        
             | szundi wrote:
             | Filesystems these days are like dbs
        
               | ahoka wrote:
               | Good luck transactionally writing files to a random FS,
               | but especially without access to native OS APIs.
        
           | 1oooqooq wrote:
           | if pg is still stuck in the 90s lisp, if bet it's just a
           | single process with the site in ram, using make-object-
           | persistent and loading as needed (kinda like python pickle).
           | 
           | that was all the rave for prototypes back then.
        
           | Zak wrote:
           | It just persists its in-memory data structures to disk.
           | Here's the source of an old version; note uses of `diskvar`
           | and `disktable`. A "table" here is just a hashtable.
           | 
           | https://github.com/wting/hackernews/blob/master/news.arc
        
           | tim333 wrote:
           | I think the structure is very simple. It's just a lot of
           | items like your comment is item 41207393 as in
           | https://news.ycombinator.com/item?id=41207393
           | 
           | I think that is just written to disk as something like
           | file41207393 when you click reply.
           | 
           | When the system needs an item it sees if it's cached in
           | memeory and otherwise reads it from disk and I think that is
           | pretty much the whole memory system. Some other stuff like
           | user id that works in the same sort of way.
        
         | tdrhq wrote:
         | I was certainly inspired by PG's writing (after all we do use
         | Common Lisp, and it's hard to avoid PG in this space). But I
         | don't think they did things like transaction logs like how
         | bknr.datastore does, which makes the development process a lot
         | more seamless.
        
         | chipdart wrote:
         | > Decades ago, PG wrote that he didn't use a database for
         | Viaweb, and that it seemed odd for web apps to be frontends to
         | databases when desktop apps were not[0].
         | 
         | After reading the link, I don't think that database means the
         | same thing for everyone.
         | 
         | The vwfaq still mentions loading data from disk, and also
         | mention "start up a process to respond to an HTTP request."
         | This suggests that by "database" they meant a separate server
         | dedicated to persist data, and having to communicate with
         | another server to fetch that data.
         | 
         | Obviously, this leaves SQLite out of this definition of
         | database. Also, if you're loading data from disk already,
         | either you're using a database or you're implementing your own
         | ad-hoc persistence layer. Would you still consider you're using
         | a database if you load data from SQLite at app start?
         | 
         | The problem with this sort of mental model is that it ignores
         | the fact that the whole point of a database is to persist and
         | fetch data in a way that is convenient to you without having to
         | bother about low-level details. Storing data in a database does
         | not mean running a postgres instance somewhere and fetching
         | data over the web. If you store all your data in-memory and
         | have a process that saves snapshots to disk using a log-
         | structured data structure... Congratulations, you just
         | developed your own database.
        
         | cultofmetatron wrote:
         | it was a different time. to my knowledge, viaweb was a series
         | of common lisp instances. All states for a user session was
         | held IN MEMORY on the individual machine. I remember reading
         | somewhere that they would be on a call with a user on
         | production and patch bugs in real time while they were on the
         | phone.
         | 
         | The web has gotten bigger and a lot of these practices simply
         | would not fly today. If I was pushing a live fix on our prod
         | machine with the amount of testing doing it live while on the
         | customer is on the phone entails today, a good portion of you
         | would be questioning my sanity.
        
           | Zak wrote:
           | An important reason that practice wasn't as reckless as it
           | sounds is that early Viaweb was just a page builder. The
           | actual web stores its customers were building were _static
           | HTML_ , so updating a customer's instance while talking to
           | them on the phone only affected that one user's backend.
        
       | Sn0wCoder wrote:
       | Not sure I would call that setup simple, but it is interesting. I
       | have honestly never heard of 'Raft' or the Raft Consensus
       | Protocol or bknr.datastore, so always happy to learn something on
       | a Friday night.
        
         | tdrhq wrote:
         | Author here.
         | 
         | I agree, the infrastructure required to make this happen
         | eventually gets quite complicated. But the developer experience
         | is what's super simple. If somebody had to take all our
         | infrastructure and just use it to build their next big app,
         | they can get the simplicity without worrying about the internal
         | plumbing.
        
         | pclmulqdq wrote:
         | Raft is fantastic and most modern systems with more than one
         | node are built on Raft. It is actually proven to be equivalent
         | to Paxos, but the semantics of it are closer to what you would
         | prefer as a software writer and the implementation is much
         | simpler.
        
       | nickpsecurity wrote:
       | What they described early on in the article was basically how
       | NUMA machines worked (eg SGI Altix or UV). Also, their claimed
       | benefit was being able to parallelize things with multithreading
       | in low-latency, huge RAM. Clustering came as a low-cost
       | alternative to $1+ million machines. There's similarities to
       | persistence in AS/400, too, where apps just wrote memory that
       | gets transparently mapped to disk.
       | 
       | Now, with cheap hardware, they're going back in time to the
       | benefits of clustered, NUMA machines. They've improved on it
       | along the way. I did enjoy the article.
       | 
       | Another trick from the past was eliminating TCP/IP stacks from
       | within clusters to knock out their issues. Solutions like Active
       | Messages were a thin layer on top of the hardware. There's also
       | designs for network routers that have strong consistency built
       | into them. Quite a few things they could do.
       | 
       | If they get big, there's hardware opportunities. On CPU side, SGI
       | did two things. Their NUMA machines expanded the number of CPU's
       | and RAM for one system. They also allowed FPGA's to plug directly
       | into the memory bus to do custom accelerators. Finally, some
       | CompSci papers modified processor ISA's, networks on a chip, etc
       | to remove or reduce bottlenecks in multithreading. Also, chips
       | like OpenPiton increase core counts (eg 32) with open,
       | customizable cores.
        
       | oconnore wrote:
       | My first thought was, "oh, I used to do this when I wrote Common
       | Lisp, it's funny someone rediscovered that technique in
       | <rust/typescript/java/whatever>".
       | 
       | But no, just more lispers.
        
       | joatmon-snoo wrote:
       | This is cool! I'm always excited by people trying simpler things,
       | as a big fan of using Boring Technology.
       | 
       | But I have some bad news: you haven't built a system without a
       | database, you've just built your own database without
       | transactions and weak durability properties.
       | 
       | > Hold on, what if you've made changes since the last snapshot?
       | And this is the clever bit: you ensure that every time you change
       | parts of RAM, we write a transaction to disk.
       | 
       | This is actually not an easy thing to do. If your shutdowns are
       | always clean SIGSTOPs, yes, you can reliably flush writes to
       | disk. But if you get a SIGKILL at the wrong time, or don't handle
       | an io error correctly, you're probably going to lose data.
       | (Postgres' 20-year fsync issue was one of these:
       | https://archive.fosdem.org/2019/schedule/event/postgresql_fs...)
       | 
       | The open secret in database land is that for all we talk about
       | transactional guarantees and durability, the reality is that
       | those properties only start to show up in the very, very, _very_
       | long tail of edge cases, many of which are easily remedied by
       | some combination of humans getting paged and end users developing
       | workarounds (eg double entry bookkeeping). This is why MySQL's
       | default isolation level can lose writes: there are usually enough
       | safeguards in any given system that it doesn't matter.
       | 
       | A lot of what you're describing as "database issues" problem
       | don't sound to me like DB issues, so much as latency issues
       | caused by not colocating your service with your DB. By hand-
       | rolling a DB implementation using Raft, you've also colocated
       | storage with your service.
       | 
       | > Screenshotbot runs on their CI, so we get API requests 100s of
       | times for every single commit and Pull Request.
       | 
       | I'm sorry, but I don't think this was as persuasive as you meant
       | it to be. This is the type of workload that, to be snarky about,
       | I could run off my phone[0]
       | 
       | [0]: https://tailscale.com/blog/new-internet
        
         | tdrhq wrote:
         | > This is actually not an easy thing to do. If your shutdowns
         | are always clean SIGSTOPs, yes, you can reliably flush writes
         | to disk. But if you get a SIGKILL at the wrong time, or don't
         | handle an io error correctly, you're probably going to lose
         | data.
         | 
         | Thanks for the comment! This is handled correctly by
         | Raft/Braft. With Raft, before a transaction is considered
         | committed it must be committed by a majority of nodes. So if
         | the transaction log gets corrupted, it will restore and get the
         | latest transaction logs from the other node.
         | 
         | > I'm sorry, but I don't think this was as persuasive as you
         | meant it to be.
         | 
         | I wasn't trying to be persuasive about this. :) I was trying to
         | drive home the point that you don't need a massively
         | distributed system to make a useful startup. I think some
         | founders go the opposite direction and try to build something
         | that scales to a billion users before they even get their first
         | user.
        
           | joatmon-snoo wrote:
           | Wait, so you're blocking on a Raft round-trip to make forward
           | progress? That's the correct decision wrt durability, but...
           | 
           | I'm now completely lost as to why you believe this was a good
           | idea over using something like MySQL/Postgres/Aurora. As I
           | see it, you've added complexity in three different dimensions
           | (novel DB API, novel infra/maintenance, and novel
           | oncall/incident response) with minimal gain in availability
           | and no gain in performance. What am I missing?
           | 
           | (FWIW, I worked on Bigtable/Megastore/Spanner/Firestore in a
           | previous job. I'm pretty familiar with what goes into
           | consensus, although it's been a few years since I've had to
           | debug Paxos.)
           | 
           | > I was trying to drive home the point that you don't need a
           | massively distributed system to make a useful startup. I
           | think some founders go the opposite direction and try to
           | build something that scales to a billion users before they
           | even get their first user.
           | 
           | This reads to me as exactly the opposite: overengineering for
           | a problem that you don't have.
           | 
           | For exactly the reasons you describe, I would argue the
           | burden of proof is on you to demonstrate why Redis, MySQL,
           | Postgres, SQLite, and other comparable options are
           | insufficient for your use case.
           | 
           | To offer you an example: let's say your Big Customer decides
           | "hey, let's split our repo into N micro repos!" and they now
           | want you to create N copies of their instance so they can
           | split things up. As implemented, you'll now need to implement
           | a ton of custom logic for the necessary data transforms. With
           | Postgres, there's a really good chance you could do all of
           | that by manipulating the backups with a few lines of SQL.
        
             | aeinbu wrote:
             | > As implemented, you'll now need to implement a ton of
             | custom logic for the necessary data transforms. With
             | Postgres, there's a really good chance you could do all of
             | that by manipulating the backups with a few lines of SQL.
             | 
             | Isn't writing <<a few Lines of SQL>> also custom logic? The
             | difference is just the language.
             | 
             | It is also possible that the custom data store is more
             | easily manipulated with other languages than SQL.
             | 
             | SQL really is great for manipulating data, but not all
             | relational databases are easy to work with.
        
       | oefrha wrote:
       | Seems weird to start with "not talking about using something like
       | SQLite where your data is still serialized", then end up with a
       | home grown transaction log that requires serialization and needs
       | to be replicated, which is how databases are replicated anyway.
       | 
       | If your load fits entirely on one server, then just run the
       | database on that damn server and forget about "special
       | architectures to reduce round-trips to your database". If your
       | data fits entirely in RAM, then use a ramdisk for the database if
       | you want, and replicate it to permanent storage with standard
       | tools. Now that's actually simple.
        
         | Groxx wrote:
         | I do feel like this largely summarizes as "we built our own
         | sqlite + raft replication", yeah. But without sqlite's battle-
         | tested reliability or the ability to efficiently offload memory
         | back to disk.
         | 
         | So, basically, https://litestream.io/ . But perhaps faster
         | switching thanks to an explicit Raft setup? I'm not a
         | litestream user so I'm not sure about the subtleties, but it
         | sounds awfully similar.
         | 
         | That overly-simplified summary aside, I quite like the idea and
         | I think the post does a pretty good job of selling the concept.
         | For a lot of systems it'll scale more than well enough to
         | handle most or all of your business even if you become
         | abnormally successful, and the performance will be absurdly
         | good compared to almost anything else.
        
           | kitd wrote:
           | Rqlite would be a better comparison. It is actually SQLite +
           | raft
           | 
           | https://github.com/rqlite/rqlite
        
             | otoolep wrote:
             | rqlite author here, happy to answer any questions.
        
               | lifeisstillgood wrote:
               | So some dumb questions if you don't mind
               | 
               | - In GitHub readme you mention etcd / consul. Is rqlite
               | suitable for transaction processing as well ?
               | 
               | - I am imagining a dirt simple load balancer over two web
               | servers. They are a crud app backed onto a database. What
               | is the disadvantages of putting rqlite on each server
               | compared to say having a third backend database.
        
               | otoolep wrote:
               | It depends on what kind of transaction support you want.
               | If your transactions need to span rqlite API requests
               | then no, rqlite doesn't support that (due to the
               | stateless nature of HTTP requests). That sort of thing
               | could be developed, but it's substantial work. I have
               | some design ideas, it may arrive in the future.
               | 
               | If you need to ensure that a given API request (which can
               | contain multiple SQL statements) is atomically processed
               | (all SQL statements succeed or none do) that _is_
               | supported however [1]. That 's why I think of rqlite as
               | closer to the kind of use cases that etcd and Consul
               | support, rather than something like Postgres -- though
               | some people have replaced their use of Postgres with
               | rqlite! [2]
               | 
               | [1] https://rqlite.io/docs/api/api/#transactions
               | 
               | [2] https://www.replicated.com/blog/app-manager-with-
               | rqlite
        
               | lifeisstillgood wrote:
               | Thank you - so my takeaway is that rqlite is well suited
               | for distributed "publishing" of data ala etcd, but it is
               | possible to use it as a Postgres replacement - thank you
               | I will give it a go
        
               | otoolep wrote:
               | As for your second question, I don't think you'd benefit
               | much from than that, for two reasons: - rqlite is a Raft
               | based system, with quorum requirements. Running 2-node
               | systems don't make much sense. [1] - Secondly, all writes
               | go to the Raft leader (rqlite makes sure this happens
               | transparently if you don't initially contact the Leader
               | node [2]). A load balancer, in this case, isn't going to
               | allow you to "spread load". What is load balancer is
               | useful for when it comes to rqlite is making life simpler
               | for clients -- they just hit the load balancer, and it
               | will find some rqlite node to handle the request
               | (redirecting to the Leader _if_ needed).
               | 
               | [1] https://rqlite.io/docs/clustering/general-
               | guidelines/#cluste...
               | 
               | [2] https://rqlite.io/docs/faq/#can-any-node-execute-a-
               | write-req...
        
             | Groxx wrote:
             | I'll throw in a "ehh... sorta" though rqlite is quite neat
             | and very much worth considering.
             | 
             | The main caveat here is that rqlite is an out-of-process
             | database, which you communicate with over http. That puts
             | it on similar grounds as e.g. postgres, just significantly
             | lighter weight, and somewhat biased in favor of running it
             | locally on every machine that needs the data.
             | 
             | So minimum read latency is likely much lower than postgres,
             | but it's still noticeable when compared to in-process
             | stuff, and you lose other benefits of in-process sqlite,
             | like trivial extensibility.
        
           | oefrha wrote:
           | They basically only save on serialization & deserialization
           | at query time, which I would consider an infinitesimal saving
           | in the vast majority of use cases. They claim to be able to
           | build some magical index that's not possible with existing
           | disk-based databases (I didn't read the linked blog post).
           | They lose access to a nice query language and entire
           | ecosystems of tools and domain knowledge.
           | 
           | I fail to see how this little bit of saving justifies all the
           | complexity for run-of-the-mill web services that fit on one
           | or a few servers as described in the article. The context
           | isn't large scale services where 1ms/request saving
           | translates to $$$, and the proposal doesn't (vertically)
           | scale anyway.
        
             | oefrha wrote:
             | One thing I forgot to mention: if you use a not-in-process
             | RDBMS on the same machine you also incur some socket
             | overhead. But that's also small.
        
             | rrrix1 wrote:
             | You should probably RTFA before making broad assumptions on
             | their solution and how it works. Most of what you wrote is
             | both incorrect and addressed in the article.
        
           | otabdeveloper4 wrote:
           | SQlite doesn't do Raft. There isn't any simple way to do
           | replicated SQlite. (In fact, writing your own database is
           | probably the simplest way currently, if SQlite+Raft is
           | actually what you want.)
        
             | carderne wrote:
             | What about rqlite?
        
         | robertclaus wrote:
         | Agreed. Reinventing the WAL means reinventing (or ignoring) all
         | the headaches that come with it. I got the impression it takes
         | them a long time to recover from the logs, so they likely
         | haven't even gotten as far as log checkpointing.
        
           | chipdart wrote:
           | > Agreed. Reinventing the WAL means reinventing (or ignoring)
           | all the headaches that come with it.
           | 
           | But if the blogger learned SQLite, how would they have a
           | topic to blog about?
           | 
           | Also, no benchmarks. It's quite odd that an argument grounded
           | on performance claims does not bother to put out any hard
           | data comparing the output of this project. I'm talking about
           | basic things like how does this contrived custom ad-hoc setup
           | compare with vanilla, out-of-the-box SQLite deployment? Which
           | one performs worse and by how much? How does the performance
           | difference reflect in request times and infrastructure cost?
           | Does it actually pay off to replace the dozen lines of code
           | of on boarding SQLite with a custom, in-development, ad-hoc
           | setup? I mean, I get the weekend personal project vibe of
           | this blog post, but if this is supposed to be a production-
           | minded project then step zero would have been a performance
           | test on the default solution. Where is it?
        
           | bjornsing wrote:
           | > I got the impression it takes them a long time to recover
           | from the logs, so they likely haven't even gotten as far as
           | log checkpointing.
           | 
           | The OP starts out by talking about periodically dumping
           | everything in RAM to disk. I'd say that's your checkpointing.
        
         | bingo-bongo wrote:
         | You don't even need a ram disk imho, databases already cache
         | everything in memory and only writes reach the disk.
         | 
         | Just try and cold-start your database and run a fairly large
         | select twice.
        
           | piker wrote:
           | Also the OS will cache a lot of the reads even if your
           | database isn't sophisticated enough or tuned correctly. Still
           | could be a fun exercise, as with all things on here.
        
             | LtdJorge wrote:
             | Any half decent DBMS bypasses the page cache, except for
             | LMDB.
        
         | nine_k wrote:
         | Trading systems bluntly keep everything in RAM, in preallocated
         | structures. It all depends on the kind of tradeoffs you're
         | willing to make.
        
           | ahoka wrote:
           | I used to work on a telecom platform (think something that
           | runs 4G services), where every node was just part of an in-
           | memory database that replicated using 2PC and just did
           | periodic snapshot to avoid losing data. Basically processes
           | were colocated with their data in the DB.
        
             | icedchai wrote:
             | I worked on a lottery / casino system that was similar. In
             | memory database ( memory mapped files), with a WAL log for
             | transaction replay / recovery. There was also a periodic
             | snapshot capability. It was incredibly low latency on late
             | 90's era hardware.
        
             | phamilton wrote:
             | Very erlang/otp. Joe Armstrong used to rant to anyone who
             | would listen that we used databases too often. If data was
             | important, multiple nodes probably need a copy of it. If
             | multiple nodes need a copy, you probably have plenty of
             | durability.
             | 
             | Even if you weren't using erlang, his influence (and in
             | general, ericsson) permeates the telecom industry.
        
         | ActorNightly wrote:
         | Setting up a single server with database replication and
         | restore functionality is arguably more complex then setting
         | this up.
         | 
         | There are libraries available to wrap your stuff with this
         | algorithm, and the benefit is that you write your server like
         | it would run on a single machine, and then when launching it in
         | prod across multiple, everything just works.
        
         | tdrhq wrote:
         | I think it's important to understand that every startup goes
         | through three phases: Explore, Expand, Extract. What's simple
         | in one phase isn't simple in the other.
         | 
         | A transactional database is simple in Expand and Extract, but
         | adds additional overhead during the Explore phase, because
         | you're focusing on infrastructure issues rather than product.
         | Data reliability isn't critical in the Explore phase either,
         | because you just don't have customers, so you just don't have
         | data.
         | 
         | Having everything in memory with bknr.datastore (without
         | replication) is simple in the Explore phase, but once you get
         | to Expand phase it adds operational overhead to make sure that
         | data is consistent.
         | 
         | But by the time I've reached the Expand phase, I've already
         | proven my product and I've already written a bunch of code.
         | Rewriting it with a transactional database doesn't make sense,
         | and it's easier to just add replication on top of it with Raft.
        
           | gtirloni wrote:
           | I'd assume in the beginning you do not want to spend time
           | writing a bunch of highly difficult code until you've proven
           | your idea/product. Then when you're big enough and have the
           | money, start replacing things where it makes sense. It seems
           | to be the strategy used by many companies.
           | 
           | Unless, of course, your startup is in the business of selling
           | DBMSes.
        
           | Groxx wrote:
           | Having Explored with a transactional database: I really can't
           | agree. Just change your database, migrations are easy and
           | should be something you're comfortable doing at any time, or
           | you'll get stuck working around it for 100x more effort in
           | the future.
        
             | kasey_junk wrote:
             | That was the biggest disconnect I had as well. SQL db have
             | the _best_ data migration tooling and practices of any data
             | system. It's not addressed in the article how migrations
             | are handled with this system but I'm assuming it's a hand
             | rolled set of code for each one.
             | 
             | I think sql db make the most sense during the explore phase
             | and you switch off of them once you know you need an
             | improvement somewhere (like latency or horizontal
             | scalability).
        
         | troupo wrote:
         | > ". If your data fits entirely in RAM, then use a ramdisk for
         | the database if you want, and replicate it to permanent storage
         | with standard tools
         | 
         | Then you get used to near-zero latency that in-RAM data gives
         | you, and when it outgrows your RAM, it's a pain in the butt to
         | move it to disk :)
        
       | theideaofcoffee wrote:
       | I get the desire to experiment with interesting things, but it
       | seems like such a huge waste of time to avoid having to learn the
       | most basic aspects of MySQL or postgres. You could "just" build
       | on top of and be done with it, especially if you're running in a
       | public cloud provider. I don't buy the increased RTT or troubles
       | with concurrency issues, the latter having simple solutions by
       | basic tuning, or breaking out your noisy customers. There's
       | another post on their blog mentioning the possibility of adding
       | 10 million rows per day and the challenges of indexing that.
       | That's... literally nothing and I don't think even 10x that
       | justifies having to engineer a custom solution.
       | 
       | Worse is better until you absolutely need to be less worse, then
       | you'll know for sure. At that point you'll know your pain points
       | and can address them more wisely than building more up front.
        
         | chipdart wrote:
         | > I get the desire to experiment with interesting things, but
         | it seems like such a huge waste of time to avoid having to
         | learn the most basic aspects of MySQL or postgres.
         | 
         | For server-based database engines you can still make an
         | argument on shedding network calls. It's dubious, but you can.
         | 
         | What's baffling is that the blogger tries to justify not
         | picking up SQLite claiming it might have features that they
         | don't need, which is absurd and does not justify anything.
         | 
         | The blog post reads like a desperate attempt to start with a
         | poor solution to a fictitions problem and proceed to come up
         | with far-fetched arguments hoping to reject the obvious
         | solution.
        
           | wongarsu wrote:
           | If you want to shed network calls, the easiest solution would
           | be to just run postgres or MySql on the same server and
           | connecting to it via Unix domain socket. So even if SQLite
           | wasn't an option network overhead isn't a good argument
        
       | ibash wrote:
       | 1. If your entire cluster goes down do you permanently lose
       | state?
       | 
       | 2. Are network requests / other ephemeral things also saved to
       | the snapshot?
        
         | tdrhq wrote:
         | [Author here] The transactions and snapshots are still logged
         | to disk. So if the cluster goes down and comes back up, each
         | one just reloads the state. Until at least two machines are
         | back up, we won't be able to serve requests though.
         | 
         | Not sure what you mean by ephemeral things. If you mean things
         | like file descriptors, they are not stored. Technically the
         | snapshot is not a simple snapshot of RAM, it snapshots through
         | all the objects in memory that are set up to be part of the
         | datastore. (It's a bit more complicated and flexible than this,
         | but that's the general idea.)
        
           | ibash wrote:
           | Ah awesome! Thank you!
        
       | wmf wrote:
       | This sounds a lot like Prevayler. https://prevayler.org/
        
         | tdrhq wrote:
         | [Author here] Indeed, bknr.datastore was inspired by Prevayler
         | and similar libraries
        
       | Tehdasi wrote:
       | Hmm, but the problem with having in-memory objects rather than a
       | db is you end up having to replicate alot of the features of a
       | relational database to get a usable system. And adding all these
       | extra features you want from those dbs end up making a simple
       | solution not very simple at all.
        
         | swiftcoder wrote:
         | To some extent I think this is an "if all you have is a
         | hammer..." situation. Relational DBs are often not a great fit
         | for how contemporary software manages data in memory (hence the
         | proliferation of ORMs, and adapter layers like graphql). I
         | think it's often easier to write out one's relations in the
         | data structures directly, rather than mapping them to queries
         | and joins
        
       | iammrpayments wrote:
       | Isn't this like redis?
        
       | andrewstuart wrote:
       | But why, when you can build things in an ordinary way with
       | ordinary tech like Python/Java/C#/TypeScript and Postgres. Lots
       | of developers know it, lots of answers to your questions online,
       | the AI knows how to write it.
       | 
       | Reading posts like this makes me think the founders/CTO is mixing
       | hobby programming with professional programming.
        
         | nesarkvechnep wrote:
         | Why not, though? Because you only know the languages you
         | listed?
        
           | andrewstuart wrote:
           | A home grown maintenance nightmare. Try logging in and
           | querying and working out what is going on.
           | 
           | There's literally no reason to waste time doing all this.
           | 
           | So many lines of pointless, wasted code.
           | 
           | Which is absolutely fine if you are hobby programming but if
           | you are running a business then this approach is wasteful.
        
       | jhardy54 wrote:
       | > Hold on, what if you've made changes since the last snapshot?
       | And this is the clever bit: you ensure that every time you change
       | parts of RAM, we write a transaction to disk. So if you have a
       | line like foo.setBar(2), this will first write a transaction that
       | says we've changed the bar field of foo to 2, and then actually
       | set the field to 2. An operation like new Foo() writes a
       | transaction to disk to say that a Foo object was created, and
       | then returns the new object.
       | 
       | >
       | 
       | > And so, if your process crashes and restarts, it first reloads
       | the snapshot, and replays the transaction logs to fully recover
       | the state. (Notice that index changes don't need to be part of
       | the transaction log. For instance if there's an index on field
       | bar from Foo, then setBar should just update the index, which
       | will get updated whether it's read from a snapshot, or from a
       | transaction.)
       | 
       | That's a database. You even linked to the specific database
       | you're using [0], which describes itself as:
       | 
       | > [...] in-memory database with transactions [...]
       | 
       | Am I misunderstanding something?
       | 
       | [0]: https://github.com/bknr-datastore/bknr-datastore
        
       | apexkid wrote:
       | > periodically just take a snapshot of everything in RAM.
       | 
       | Sound similar to `stop the world Garbage collection` in Java.
       | Does your entire processing comes to halt when you do this? How
       | frequently do you need to take snapshots? Or do you have a way to
       | do this without halting everything
        
         | tdrhq wrote:
         | Good catch! Snapshotting was certainly a bottleneck that I
         | chose not to write about.
         | 
         | But we aren't really taking the snapshot of RAM, instead we're
         | running some code asking each object to snapshot itself into a
         | stream. If you do this naively, it will block writes on the
         | server until the snapshot is done (reads will continue to
         | work).
         | 
         | But Raft has a protocol for asynchronous snapshots. So in the
         | first step we take an immutable fast snapshot of the state we
         | care about which happens quickly, then writes can keep going
         | while in the background we serialize the state to disk.
        
       | AdieuToLogic wrote:
       | > Imagine all the wonderful things you could build if you never
       | had to serialize data into SQL queries.
       | 
       | This exists in sufficiently mature Actor model[0]
       | implementations, such as Akka Event Sourcing[1], which also
       | addresses:
       | 
       | > But then comes the important part: how do you recover when your
       | process crashes? It turns out that answer is easy, periodically
       | just take a snapshot of everything in RAM.
       | 
       | Intrinsically and without having to create "a new architecture
       | for web development". There are even open source efforts which
       | explore the RAFT protocol using actors here[2] and here[3].
       | 
       | 0 - https://en.wikipedia.org/wiki/History_of_the_Actor_model
       | 
       | 1 - https://doc.akka.io/docs/akka/current/typed/persistence.html
       | 
       | 2 - https://github.com/Michael-Dratch/RAFT_Implementation
       | 
       | 3 - https://github.com/invkrh/akka-raft
        
         | jeremycarter wrote:
         | I have built some medium sized systems using Microsoft Orleans
         | (Virtual Actors). There was no transactional database involved,
         | but everything was ordered and fully transactional.
         | 
         | If you choose say Cosmos DB, MongoDB or DynamoDB as your
         | persistence provider you can even query the persisted state.
         | 
         | https://learn.microsoft.com/en-us/dotnet/orleans/grains/grai...
         | 
         | https://learn.microsoft.com/en-us/dotnet/orleans/grains/tran...
         | 
         | https://learn.microsoft.com/en-us/dotnet/orleans/grains/even...
        
       | mg wrote:
       | When I start a new project, the data structure usually is a "list
       | of items with attributes". For example right now, I am writing a
       | fitness app. The data consists of a list of exercises and each
       | exercise has a title, a description, a video url and some other
       | attributes.
       | 
       | I usually start by putting those items into YAML files in a
       | "data" directory. Actually a custom YAML dialect without the
       | quirks of the original. Each value is a string. No magic type
       | conversions. Creating a new item is just "vim crunches.yaml" and
       | putting the data in. Editing, deleting etc all is just
       | wonderfully easy with this data structure.
       | 
       | Then when the project grows, I usually create a DB schema and
       | move the items into MariaDB or SQLite.
       | 
       | This time, I think I will move the items (exercises) into a JSON
       | column of an SQLite DB. All attributes of an item will be stored
       | in a single JSON field. And then write a little DB explorer which
       | lets me edit JSON fields as YAML. So I keep the convenience of
       | editing human readable data.
       | 
       | Writing the DB explorer should be rather straight forward. A bit
       | of ncurses to browse through tables, select one, browse through
       | rows, insert and delete rows. And for editing a field, it will
       | fire up Vim. And if the field is a JSON field, it converts it to
       | YAML before it sends it to Vim and back to JSON when the user
       | quits Vim.
        
       | aorloff wrote:
       | This is like an example case of a lambda + kinesis
        
       | nesarkvechnep wrote:
       | It'll be interesting to do something like this in Elixir where
       | clustering is almost a runtime primitive.
        
       | nilirl wrote:
       | I'm baffled at the arguments made in this article. This is
       | supposed to be a simpler and faster way to build stateful
       | applications?
       | 
       | The premises are weak and the claims absurd. The author uses
       | overstatement of the difficulties of serialization just to make
       | their weak claim stronger.
        
         | t0mas88 wrote:
         | And then they implement serialization to write their
         | transactions to a log and replicate them to the other nodes...
        
         | voidfunc wrote:
         | Big vibes of "We are very smart, see how smart we are?" from
         | the blog post.
         | 
         | These kind of people usually suck to work with. I'm glad
         | they've found a startup to sink so I don't have to deal with
         | them.
        
       | golergka wrote:
       | > Imagine all the wonderful things you could build if you never
       | had to serialize data into SQL queries.
       | 
       | No transactions, no WAL, no relational schema to keep data design
       | sane, no query planner doing all kinds of optimisations and
       | memory layout things I don't have to think about?
       | 
       | You could say that transactions, for example, would be redundant
       | if there is no external communication between app server and the
       | database. But it is far from the only thing they're useful for.
       | Transactions are a great way of fulfilling important invariants
       | about the data, just like a good strict database schema. You
       | rollback a transaction if an internal error throws. You make sure
       | that transaction data changes get serialised to disk all at once.
       | You remove a possibility that statements from two simultaneous
       | transactions access the same data in a random order (at least if
       | you pick a proper transaction isolation level, which you usually
       | should).
       | 
       | > You also won't need special architectures to reduce round-trips
       | to your database. In particular, you won't need any of that
       | Async-IO business, because your threads are no longer IO bound.
       | Retrieving data is just a matter of reading RAM. Suddenly
       | debugging code has become a lot easier too.
       | 
       | Database is far from the only other server I have to communicate
       | with when I'm working on user's HTTP request. As a web developer,
       | I don't think I've worked on a single product in the last 4 years
       | that didn't have some kind of server-server communication for
       | integrations with other tools and social media sites.
       | 
       | > You don't need crazy concurrency protocols, because most of
       | your concurrency requirements can be satisfied with simple in-
       | memory mutexes and condition variables.
       | 
       | Ah, mutexes. Something that programmers never shot themselves in
       | a foot with. Also, deadlocks don't exist.
       | 
       | > Hold on, what if you've made changes since the last snapshot?
       | And this is the clever bit: you ensure that every time you change
       | parts of RAM, we write a transaction to disk. So if you have a
       | line like foo.setBar(2), this will first write a transaction that
       | says we've changed the bar field of foo to 2, and then actually
       | set the field to 2. An operation like new Foo() writes a
       | transaction to disk to say that a Foo object was created, and
       | then returns the new object.
       | 
       | A disk write latency is added to every RAM write. It has no
       | performance cost and nobody notices this.
       | 
       | I apologise if this comes off too snarky. Despite all of the
       | above, I really like this idea -- and already think of
       | implementing it in a hobby project, just to see how well it
       | really works. I'm still not sure if it's practical, but I love
       | the creative thinking behind this, and a fact that it actually
       | helped them build a business.
        
         | briHass wrote:
         | I would add that the 'serialization' to a RDBMS-schema cites as
         | a negative is actually a huge positive for most systems.
         | Modeling your data relationally, often in 3NF, usually differs
         | from the in-memory/code objects in all but the most simple ORM
         | class=table projects. Thinking deeply about how to persist data
         | in a way that makes it flexible and useful as application needs
         | change (i.e. the database outlives the applications(s)) has
         | value in itself, not just a pointless cost.
         | 
         | I like being able to draw a hard line between application data
         | structures, often ephemeral and/or optimized for particular
         | tasks -- and the persisted, domain data which has meaning
         | beyond a specific application use case.
        
       | antman wrote:
       | As a side question is there a python library for braft or a
       | production grade raft library for python?
        
         | tdrhq wrote:
         | There's a list of libraries here, which include a few Python
         | libraries: https://raft.github.io/
         | 
         | I don't know if they're production grade. I was drawn to Braft
         | because of Baidu's backing.
        
       | leokennis wrote:
       | I'm not from "start up world" but in the end, few things give me
       | more comfort and lack of surprises down the line than just having
       | a relational database with built in redundancy/transaction
       | logs/back up/recovery. Sure there might always be edge cases
       | (lack of money, regulations, specialist software offering) but in
       | the vast majority of cases - just get a database.
        
         | gunapologist99 wrote:
         | It's interesting you say "backup/recovery" as a strong point of
         | relational databases (servers), because backup and recovery on
         | hot databases have always been a challenge.
         | 
         | With many enterprise databases these days, often "incremental"
         | or other seemingly required backup modes are not included in
         | the "community source" versions; perhaps because surely if you
         | want your database to be backed up safely and then come back
         | online safely, you certainly will fall into the "contact us for
         | quote" enterprise customer demographic.
         | 
         | At least, with SQLite, copying even a hot (in-use) db file to a
         | remote server will usually "just work", with the potential loss
         | of a few transactions, but with most other database/servers,
         | you definitely can't just backup the data directory
         | occasionally and call it a day.
        
           | leokennis wrote:
           | Like I mentioned, I don't have experience working in a start
           | up. My real world experience with backup/recovery of a live
           | relational DB has been with Oracle using ZDLRA - and indeed
           | its license probably costs dearly.
           | 
           | For stuff like MariaDB a quick search also finds options to
           | perform snapshots, backups, restores etc.
           | 
           | And if you need to be super high available, set up a
           | distributed DB like Cassandra - you lose the relational and
           | transaction part, but at least you're running a product with
           | known failure modes and known ways to prevent/circumvent
           | them.
           | 
           | I guess my bigger point is that besides "don't roll your own
           | crypto", I'd also advice not to roll your own DB. There's a
           | lot of known stuff in the market, all built by people who
           | made and fixed the mistakes you're going to make a long time
           | ago.
        
       | lpapez wrote:
       | I once saw a project in the wild where the "database" was
       | implemented using filesystem directories as "tables" with JSON
       | files inside as "rows".
       | 
       | When I asked people working on it if they considered Redis or
       | Mongo or Postgres with jsonb columns, they just said they
       | considered all of those things but decided to roll out their own
       | db anyway because "they understood it better".
       | 
       | This article gives off the same energy. I really hope it works
       | out for you, but IMO spending innovation tokens to build a
       | database is nuts.
        
         | ActorNightly wrote:
         | This isn't innovation though. You literally just write your
         | server like you would for a single machine, then wrap it any of
         | the available Raft libraries.
         | 
         | AWS and other cloud providers are money printers because a lot
         | of engineers are insanely tied into established patterns of
         | doing things and can't think through things at a fundamental
         | level. Ive seen company backends where their entire AWS stacks
         | could be replaced by a 2 EC2 instances behind a load balancer
         | with a domain name, without affecting business flow.
         | 
         | We did something similar to the work in the OP post at my work,
         | we had a bunch of ECS tasks for a service, where the service
         | did another call to an upstream service to fetch some
         | intermediate results. We wanted to cache results for lower
         | response latency. People were working to set up a Redis
         | cluster. Except the TPS of the service was like 0.1.
         | 
         | Took me one day to code a /sync api endpoint, which was just a
         | replica of the main endpoint. The only difference is that the
         | main endpoint would spin of a thread to call the /sync
         | endpoint, whereas the /sync endpoint didn't. Both endpoints
         | ended with caching the results in memory before returning. Easy
         | as day, no additional infra costs necessary.
         | 
         | But overall, personally, I don't hate the "spending innovation
         | tokens to build a database is nuts" sentiment too much, because
         | it keeps me employed at high salary while doing minimal work,
         | where things that really should be basic CS are considered
         | innovation.
        
           | gtirloni wrote:
           | Software Engineering is different than CS though.
        
           | bastawhiz wrote:
           | > then wrap it any of the available Raft libraries.
           | 
           | Raft does consensus. Raft does not do persistence to disk,
           | WAL, crash recovery, indexing, vacuuming (you're using
           | tombstones for your deletes, right?), or any of the other
           | necessary pieces of a database. That's not mentioning how
           | such a system has _no query engine_ , so every piece of data
           | you're looking up in every place you need data is traversing
           | your bespoke data structures.
           | 
           | What you described isn't a database. Keeping some disposable
           | values cached isn't a database.
        
         | jmull wrote:
         | I get your point and I don't doubt the project you're talking
         | about was a mess, but the file system _is_ a database, and can
         | be a very good choice, depending on exactly what you're doing.
        
           | rrrix1 wrote:
           | The file system is a database _and_ an API.
           | 
           | Magic!
        
         | philipwhiuk wrote:
         | > I once saw a project in the wild where the "database" was
         | implemented using filesystem directories as "tables" with JSON
         | files inside as "rows".
         | 
         | I did this sort of thing recently. I felt bad doing it, I still
         | objectively hate it, because I do know enough to know that
         | basically I'm re-implementing what years of hardworking O/S
         | developers have done, piecemeal. But at least I'm going in with
         | my eyes open which feels better.
         | 
         | The only real mitigating factor I have is that the application
         | is largely 'never-read' and then when reading is done, it's
         | sequential batches. Which is not normally something databases
         | optimise for and works okay for file-storage.
         | 
         | (If someone _does_ know a lightweight database architecture
         | that performs like this, let me know).
        
       | hankchinaski wrote:
       | textbook example of overengineering for no reason
        
       | annacappa wrote:
       | It's great that people explore new ideas. However this does not
       | seem like a good idea.
       | 
       | It claims to solve a bunch of problems by ignoring them. There
       | are solid reasons why people distribute their applications across
       | multiple machines. After reading this article I feel like we need
       | to state a bunch of them.
       | 
       | Redundancy - what if one machine breaks either a hardware failure
       | a software failure or a network failure (network partition where
       | you can't reach the machine or it can't reach the internet)
       | 
       | Scaling- what if you can't serve all of your customers from one
       | machine ? Perhaps you have many customers and a small app or
       | perhaps your app can use a lot of resources (maybe it loads gigs
       | of data)
       | 
       | Deployment - what happens when we want to change the code and not
       | go down if you are running multiple copies of your app you get
       | this for cheap
       | 
       | There are tons of smaller benefits - right sizing your
       | architecture What if the one machine you choose is not big enough
       | you need to move to a new machine, with multiple machines you
       | just increase the number of machines. You also get to use a
       | variety of machine sizes and can choose ones that fit your needs
       | so this flexibility allows you to choose cheaper machines
       | 
       | I feel like the authors don't know why people invented the
       | standard way of doing things.
        
         | annacappa wrote:
         | The more I think about it the worse it gets.
         | 
         | Because we don't want everything to fall over when one machine
         | goes down we need at least 3 machines (for raft). So if our
         | traditional db would have 500 GB of data we now need 3 machines
         | with 500 GB of ram running at all times. That is an epic waste
         | of money. Millions per year to run ? And you could store it in
         | a db for a couple of dollars.
        
           | 1oooqooq wrote:
           | their use case is mostly-never-retrieved images!
           | 
           | they store the index of files only in memory. and have the
           | entire build time to fetch build-1 images to get ready for
           | the diff.
           | 
           | it's much easier than most use cases
        
             | annacappa wrote:
             | So all of this ram is being used and is only accessed
             | sporadically if at all. This is not good. Sounds like you
             | could implement the entire thing on a micro db instance
             | (redis or a regular db) with no raft or any other custom
             | implementation or messing.
        
       | LAC-Tech wrote:
       | Really got a kick out of this article. RAM is big, and cheap. And
       | as we all know the database is the log, and everything else is
       | just the cache. A few questions, comments!
       | 
       | 1. I take it you've seen the LMAX talk [0], and were similarly
       | inspired? :)
       | 
       | 2. Are you familiar with the event sourcing approach? It's
       | basically what you describe, except you don't flush to disk after
       | editing every field, you batch your updates into a single
       | "event". (you've come at it from the exact opposite end, but it
       | looks like roughly the same thing).
       | 
       | [0] https://www.infoq.com/presentations/LMAX/
        
       | qprofyeh wrote:
       | We used Redis with persistence to build our first prototype. It
       | performed amazingly and development speed was awesome. We were a
       | full year beyond break-even before adding MySQL to the stack for
       | the few times we missed the ability to run SQL queries, for
       | finance.
        
       | _the_inflator wrote:
       | This reminds me of the heated discussions around jQuery by some
       | so called performance driven devs, which cumulated into this
       | website:
       | 
       | https://youmightnotneedjquery.com/
       | 
       | The overwhelming majority underestimates the beauty and effort as
       | well as experience that goes into abstractions. There are some
       | true geniuses at times doing fantastic work, to deliver
       | syntactical sugar while the critics mock the maybe somewhat
       | larger bundle size for "a couple of lines frequently used."
       | That's why.
       | 
       | In the end, a good framework is more than just an abstraction. It
       | guarantees consistency and accessibility.
       | 
       | Try to understand the source code if possible before reinventing
       | the wheel is my advice.
       | 
       | What maybe starts out to be fun quickly becomes a burden. If
       | there weren't any edge cases or different conditions, you
       | wouldn't need an abstraction. Been there, done that.
        
       | k__ wrote:
       | Well, at least they gave an example of what not to do.
        
       | gchamonlive wrote:
       | The problem is, you only know what you know.
       | 
       | Sure you reduce deployment complexity, but what about maintaining
       | your algorithm that implements data persistence and replication?
       | 
       | To assume that will never spectacularly bite you is naive. Tests
       | also only go so far as you know what you are testing for, and
       | while you don't know if your product will ever be used, you also
       | don't know if it will explode in success and you will be hostage
       | of your own decisions and technical debt.
       | 
       | These are HARD decisions. Hard decisions require solid solutions.
       | You can surely try that with toy projects, but if I was in a
       | position to build a software architecture for something that had
       | a remote possibility of being used in production, I would oppose
       | such designs adamantly.
        
       | paxys wrote:
       | There is so much wrong with this I don't know where to even
       | start. You want to "keep things simple" and not stand up a
       | separate instance of MySQL/Postgres/Redis/MongoDB/whatever else.
       | So, you:
       | 
       | 1. Create your own in-memory database.
       | 
       | 2. Make sure every transaction in this DB can be serialized and
       | is simultaneously written to disk.
       | 
       | 3. Use some orchestration platform to make all web servers aware
       | of each other.
       | 
       | 4. Synchronize transaction logs between your web servers (by
       | implementing the Raft protocol) and update the in-memory DB.
       | 
       | 5. Write some kind of conflict resolution algorithm, because
       | there's no way to implement locking or enforce
       | consistency/isolation in your DB.
       | 
       | 6. Shard your web servers by tenant and write another load
       | balancing layer to make sure that requests are getting to the
       | server their data is on.
       | 
       | Simple indeed.
        
         | ozim wrote:
         | I don't want to go ad personam on the blog author - but
         | checking his socials he is not really experienced person.
         | 
         | I don't think we have anything to discuss here. He seems just
         | to want to do cool stuff and his drop of databases seems to be
         | because he just doesn't know a lot of stuff there is to know.
         | 
         | I applaud attempt and might be that his needs will be covered
         | by what he is doing.
         | 
         | But for everyone else yes, pick boring technology if you want
         | to do startup because technology shouldn't be hard or something
         | you worry about if you are making web applications.
        
           | tdrhq wrote:
           | > but checking his socials he is not really experienced
           | person.
           | 
           | I'm not sure what qualifies as experience if Meta/Google
           | doesn't. ;)
        
             | ozim wrote:
             | Well he is not Kent Beck or Jon Skeet, Martin Fowler - that
             | is what I call experienced to take seriously a blog post.
             | 
             | Just working at Meta/Google doesn't impress me much just
             | like Shania Twain would sing.
        
               | rrrix1 wrote:
               | > he is not Kent Beck or Jon Skeet, Martin Fowler
               | 
               | Just FYI, you are (perhaps unintentionally) showing your
               | lack of experience.
               | 
               | There are many thousands of brilliant engineers for every
               | brilliant engineer who also is a
               | author/speaker/publisher. These are very different
               | skills.
               | 
               | Also, perhaps the author _is_ the next Martin Fowler? You
               | never know...
        
           | mtlynch wrote:
           | > _I don't want to go ad personam on the blog author - but
           | checking his socials he is not really experienced person._
           | 
           | According to LinkedIn:
           | 
           | - Masters in CS from UPenn
           | 
           | - 1 year as SWE at Google
           | 
           | - 6 years as SWE at FB/Meta
           | 
           | - 6 years running his own company
           | 
           | When I hear "not really experienced," I think recent college
           | grad, not someone with a Master's and 15 years of industry
           | experience.
        
             | williamdclt wrote:
             | Well that's only 7y of working with people to learn from,
             | it's not nothing but it's not enough credentials to make me
             | go from "it's a horrible idea" to "I must be missing
             | something"
        
         | karmakaze wrote:
         | I played with making an in-memory database too, but I wouldn't
         | recommend anyone use one in production unless they have strict
         | latency requirements.
         | 
         | Simple is what people are already using. And beware 'good for
         | startups' tech. If you're successful you'll have legacy 'bad
         | for scale' tech.
        
         | SoftTalker wrote:
         | Yeah and good luck when the CEO starts asking for reports and
         | metrics (or anything else that databases have been optimized
         | over the last 50 years to do very well).
         | 
         | Surely this is a parody article of some sort?
        
       | 0x74696d wrote:
       | This architecture is roughly how HashiCorp's Nomad, Consul, and
       | Vault are built (I'm one of the maintainers of Nomad). While it's
       | definitely a "weird" architecture, the developer experience is
       | really nice once you get the hang of it.
       | 
       | The in-memory state can be whatever you want, which means you can
       | build up your own application-specific indexing and querying
       | functions. You _could_ just use sqlite with :memory: for the Raft
       | FSM, but if you can build /find an in-memory transaction store
       | (we use our own go-memdb), then reading from the state is just
       | function calls. Protecting yourself from stale reads or write
       | skew is trivial; every object you write has a Raft index so you
       | can write APIs like "query a follower for object foo and wait
       | till it's at least at index 123". It sweeps away a lot of "magic"
       | that normally you'd shove into a RDBMS or other external store.
       | 
       | That being said, I'd be hesitant to pick this kind of
       | architecture for a new startup outside of the "infrastructure"
       | space... you are effectively building your own database here
       | though. You need to pick (or write) good primitives for things
       | like your inter-node RPC, on-disk persistence, in-memory
       | transactional state store, etc. Upgrades are especially
       | challenging, because the new code can try to write entities to
       | the Raft log that nodes still on the previous version don't
       | understand (or worse, misunderstand because the way they're
       | handled has changed!). There's no free lunch.
        
         | jstrong wrote:
         | like you I'm more open to the idea of keeping data in memory
         | than most of the responders here. when I got to the part of the
         | article about how they are using common lisp with hot
         | reloading, I was thinking, well you guys can do whatever you
         | want, but not everybody is working on that team, ha.
        
       | donatj wrote:
       | I've got a handful of small Go applications where I just have a
       | "go generate" command that generates the entire dataset as Go, so
       | the data set ends up compiled into the binary. Works great.
       | 
       | https://emoji.boats/ is the most public facing of these.
       | 
       | I also have built a whole class of micro-services that pull their
       | entire dataset from an API on start up, hold it resident and
       | update on occasion. These have been amazing for speeding up
       | certain classes of lookup for us where we don't always need
       | entirely up to date data.
        
       | tofflos wrote:
       | Check out https://eclipsestore.io (previously named Microstream)
       | if you're into Java and interested in some of the ideas presented
       | in this article. You use regular objects, such as Records, and
       | regular code, such as java.util.stream, for processing, and the
       | library does snapshotting to disk.
       | 
       | I haven't tried it out but just thinking of how many fewer
       | organizational hoops I would have to jump through makes we want
       | to try it out:
       | 
       | - No ordering a database from database operations.
       | 
       | - No ordering a port opening from network operations.
       | 
       | - No ordering of certificates.
       | 
       | - The above times 3 for development, test and production.
       | 
       | - Not having to run database containers during development.
       | 
       | I think the sweet spot for me would be in services that I don't
       | expect to grow beyond a single node and there is an acceptance
       | for a small amount of downtime during service windows.
        
       | ksec wrote:
       | >RAM is super cheap
       | 
       | I think this has to be the number one misunderstanding for
       | developers.
       | 
       | Yes, SSD in terms of throughput or IOPs has gone up by 100 to
       | 10000x. vCPU performance per dollar has gone up by 20 - 50x. We
       | went from 45/32nm to now 5nm/3nm, and much higher IPC.
       | 
       | But RAM price hasn't gotten anywhere near the same fall as CPU or
       | SSD. It may have gotten a lot faster, you may be even getting to
       | stick lots of memory with higher density chip and channels went
       | from dual to 8 or 12. But if you look at the DRAM Spot price
       | since 2008 to 2022, you will see the lowest DRAM price has been
       | the same at around $2.8/GB for three times. As the DRAM price
       | goes in cycle with $8 / $6 per GB in between this same period.
       | i.e Had you bought DRAM at its lowest point or its highest point
       | during the past ~15 years your DRAM would have cost roughly the
       | same plus or minus 10-20% ignoring inflation.
       | 
       | It was only until Mid 2022 it finally broke through the $2.8/GB
       | barrier and collapse close to $1/GB before settling on ~ $2/GB
       | for DDR5.
       | 
       | Yes you can now get 4TB RAM on a server. But it doesn't mean DRAM
       | are super cheap. Developers on average or for those in big Tech
       | are now earning way more than they were in 2010. Which makes them
       | think RAM has gotten a lot more affordable. In reality even in
       | the lowest point over past 15 years you only get at best slightly
       | more than 2x reduction in DRAM price. And we will likely see DRAM
       | price shot up again in a year or two.
        
         | klysm wrote:
         | Simultaneously, many developers reach for distributed systems
         | too quickly when they could just buy more ram. Perhaps that's
         | what the writer means
        
         | rrrix1 wrote:
         | An alternative interpretation is that the maximum RAM capacity
         | for an individual node has drastically increased over the last
         | couple of decades.
         | 
         | A simplistic example, if a given node was limited to 16GB of
         | RAM 20 years ago, I would need 256 nodes to have 4TB of RAM for
         | my system (not including overhead for each OS).
         | 
         | Compared to today, where a single node can have that entire 4TB
         | all in one chassis.
         | 
         | The total cost of RAM chips themselves may not have changed,
         | but the actual cost of using that RAM in a physical system has
         | dropped dramatically.
        
       | jb3689 wrote:
       | We wanted to simplify our architecture and not use a database, so
       | instead we created our own version of everything databases
       | already do for us. Super risky for a company. Hopefully you don't
       | spend all of your time maintaining, optimizing, and scaling this
       | custom architecture.
        
       | bastawhiz wrote:
       | Please, someone explain how building your own in-memory database
       | and snapshotting on top of Raft is simpler than just installing
       | Postgres or SQLite with one of the modern durability tools.
       | Seriously, if you genuinely believe writing concurrency code with
       | mutexes and other primitives and hoping that's all correct is
       | _easier_ than just writing a little SQL, you 've tragically lost
       | your way.
        
       | samarabbas wrote:
       | Notice how the complexity of this grows suddenly when you start
       | thinking about infrastructure failure and restarts due to
       | deployments. I have seen this play out dozens of time in my
       | professional career where these systems although starts very
       | simple but eventually becomes a huge maintenance burden over
       | time. This is where high level abstractions like Durable
       | Execution is much more powerful for developers which has the
       | potential to abstract out this level of details. Basically code
       | up your application like infrastructure failures does not exist
       | and let underlying Durable Execution platform like Temporal or
       | something similar handle resiliency for you.
        
       ___________________________________________________________________
       (page generated 2024-08-10 23:01 UTC)