[HN Gopher] Rediscovering Transaction Processing from History an...
       ___________________________________________________________________
        
       Rediscovering Transaction Processing from History and First
       Principles
        
       Author : todsacerdoti
       Score  : 40 points
       Date   : 2024-07-23 10:52 UTC (1 days ago)
        
 (HTM) web link (tigerbeetle.com)
 (TXT) w3m dump (tigerbeetle.com)
        
       | simonz05 wrote:
       | In this X thead Joran comments how it all started with a HN
       | comment back in 2019.
       | https://x.com/jorandirkgreef/status/1815702485190774957
        
         | jorangreef wrote:
         | It was pretty surreal to sit next to someone at a dinner in NYC
         | two months ago, be introduced, and realize that they're someone
         | you had an HN exchange with 5 years ago.
         | 
         | Here's the HN thread:
         | https://news.ycombinator.com/item?id=20352439
         | 
         | And the backstory (on how this led to TigerBeetle):
         | https://x.com/jorandirkgreef/status/1788930243077853234
        
       | mihaic wrote:
       | I honestly don't understand why this isn't a Postgres extension.
       | In what case is it better to have two databases?
       | 
       | Realistically, I can't see a scenario where you need something
       | like this but at the same time are sure that you don't need both
       | atomic database and financial ledger operations.
        
         | twic wrote:
         | > We saw that there were greater gains to be had than settling
         | for a Postgres extension or stored procedures.
         | 
         | > Today, as we announce our Series A of $24 million
         | 
         | Can't raise a 24 million series A for a Postgres extension!
        
           | mihaic wrote:
           | That's true, but who are the executives that buy this and
           | then force developers to create a monstrous architecture that
           | has all sorts of race conditions outside of the ledger?
        
             | jorangreef wrote:
             | To be clear, TB moves the code to the data, rather than the
             | data to the code, and precisely so that you don't have
             | "race conditions outside the ledger".
             | 
             | Instead, all kinds of complicated debit/credit contracts
             | (up to 8k financial transactions at a time, linked together
             | atomically) can be expressed in a single request to the
             | database, composed in terms of a rich set of debit/credit
             | primitives (e.g. two-phase debit/credit with rollback after
             | a timeout), to enforce financial consistency directly in
             | the database.
             | 
             | On the other hand, moving the data to the code, to make
             | decisions outside the OLTP database was exactly the anti-
             | pattern we were wanting to fix in the central bank switch,
             | as it tried to implement debit/credit primitives but over
             | general-purpose DBMS. It's really hard to get these things
             | right on top of Postgres.
             | 
             | And even if you get the primitives right, the performance
             | is fundamentally limited by row locks interacting with RTTs
             | and contention. Again, these row locks are not only
             | external, but also internal (i.e. how I/O interacts with
             | CPU inside the DBMS), and why stored procedures or
             | extensions aren't enough to fix the performance.
        
         | jorangreef wrote:
         | Hey mihaic! Thanks for the question.
         | 
         | > I honestly don't understand why this isn't a Postgres
         | extension.
         | 
         | We considered a Postgres extension at the time (as well as
         | stored procedures or even an embedded in-process DBMS).
         | 
         | However, this wouldn't have moved the needle to where we needed
         | it to be. Our internal design requirements (TB started as an
         | internal project at Coil, contracting on a central bank switch)
         | were literally a three order of magnitude increase in
         | performance--to keep up with where transaction workloads were
         | going.
         | 
         | While an extension or stored procedures would reduce external
         | locking, the general-purpose DBMS design implementing them
         | still tends to do far too much internal locking, interleaving
         | disk I/O with CPU and coupling resources. In contrast,
         | TigerBeetle explicitly decouples disk I/O and CPU to amortize
         | internal locking and so "pipeline in bulk" for mechanical
         | sympathy. Think SIMD vectorization but applied to state machine
         | execution.
         | 
         | For example, before TB's state machine executes 1 request of 8k
         | transactions, all data dependencies are prefetched in advance
         | (typically from L1/2/3 cache) so that the CPU becomes like a
         | sprinter running the 100 meters. This suits extreme OLTP
         | workloads where a few million debit/credit transactions need to
         | be pushed through less than 10 accounts/rows (e.g. for a small
         | central bank switch with 10 banks around the table). This is
         | pathological for a general-purpose DBMS design, but easy for TB
         | because hot accounts are hot in cache, and all locking (whether
         | external or internal) is amortized across 8k transactions.
         | 
         | I spoke at QCon SF on this
         | (https://www.youtube.com/watch?v=32LMicc0gRA) and matklad did
         | two IronBeetle episodes walking through the code
         | (https://www.youtube.com/watch?v=v5ThOoK3OFw&list=PL9eL-
         | xg48O...).
         | 
         | But the big problem with extensions or stored procedures is
         | that they still tend to have a "one transaction at a time"
         | mindset at the network layer. In other words, they don't
         | typically amortize network requests beyond a 1:1 ratio of
         | logical transaction to physical SQL transaction; they're not
         | ergonomic if you want to pack a few thousand logical
         | transactions in one physical query.
         | 
         | On the other hand, TB's design is like "stored procedures meets
         | group commit on steroids", packing up to 8k logical
         | transactions in 1 physical query, and amortizing the costs not
         | only of state machine execution (as described above) but also
         | syscalls, networking and fsync (it's something roughly like 4
         | syscalls, 4 memcopies and 4 network messages to execute 8k
         | transactions--really hard for Postgres to match that).
         | 
         | Postgres is also nearly 30 years old. It's an awesome database
         | but hardware, software and research into how you would design a
         | transaction processing database today has advanced
         | significantly since then. For example, we wanted more safety
         | around things like Fsyncgate by having an explicit storage
         | fault model. We also wanted deterministic simulation testing
         | and static memory allocation, and to follow NASA's Power of Ten
         | Rules for Safety-Critical code.
         | 
         | A Postgres extension would have been a showstopper for these
         | things, but these were the technical contributions that needed
         | to be made.
         | 
         | I also think that some of the most interesting performance
         | innovations (static memory allocation, zero-deserialization,
         | zero-context switches, zero-syscalls etc.) are coming out of
         | HFT these days. For example, Martin Thompson's Evolution of
         | Financial Exchange Architectures:
         | https://www.youtube.com/watch?v=qDhTjE0XmkE
         | 
         | HFT is a great precursor to see where OLTP is going, because
         | the major contention problem of OLTP is mostly solved by HFT
         | architectures, and because the arbitrage and volume of HFT is
         | now moving into other sectors--as the world becomes more
         | transactional.
         | 
         | > In what case is it better to have two databases?
         | 
         | Finally, regarding two databases, this was something we wanted
         | explicit in the architecture. Not to "mix cash and customer
         | records" in one general-purpose mutable filing cabinet, but
         | rather to have "separation of concerns", the variable-length
         | customer records in the general-purpose DBMS (or filing
         | cabinet) in the control plane, and the cash in the immutable
         | financial transactions database (or bank vault) in the data
         | plane.
         | 
         | See also: https://docs.tigerbeetle.com/coding/system-
         | architecture
         | 
         | It's the same reason you would want Postgres + S3, or Postgres
         | + Redpanda. Postgres is perfect as a general-purpose or OLGP
         | database, but it's not specialized for OLAP like DuckDB, or
         | specialized for OLTP like TigerBeetle.
         | 
         | Again, appreciate the question and happy to answer more!
        
           | mihaic wrote:
           | Thanks for taking the time for the explanation and the
           | rundown on the architecture. Sounds a bit like an LMAX
           | disruptor for DB, which honestly is quite a natural
           | implementation of performance. Kudos for the Zig
           | implementation as well, I've never seen a project as serious
           | in it.
           | 
           | Personally, I still see challenges in developing on top of a
           | system with data in two places unless there's a nice way to
           | sync between them, and I would have seen the
           | mutable/immutable classification as more of unlogged vs
           | changes fully logged in DB, but I'm just doing armchair
           | analysis here.
        
             | jorangreef wrote:
             | Huge pleasure! :)
             | 
             | Exactly, the Martin Thompson talk I linked above is about
             | the LMAX architecture. He gave this at QCon London I think
             | in May 2020 and we were designing TigerBeetle in July 2020,
             | pretty much lapping this up (I'd been a fan of Thompson's
             | Mechanical Sympathy blog already for a few years by this
             | point).
             | 
             | I think the way to see this is not as "two places for the
             | same type of data" but rather as "separation of concerns
             | for radically different types of data" with different
             | compliance/retention/mutability/access/performance/scale
             | characteristics.
             | 
             | It's also a natural architecture, and nothing new. How you
             | would probably want to architect the "core" of a core
             | banking system. We literally lifted the design for
             | TigerBeetle directly out of the central bank switch's
             | internal core, so that it would be dead simple to "heart
             | transplant" back in later.
             | 
             | The surprising thing though, was when small fintech
             | startups, energy and gaming companies started reaching out.
             | The primitives are easy to build with and unlock
             | significantly more scale. Again, like using object storage
             | in addition to Postgres is probably a good idea.
        
       | jorangreef wrote:
       | Joran from TigerBeetle here! Really stoked to see a bit of Jim
       | Gray history on the front page and happy to dive into how TB's
       | consensus and storage engine implements these ideas (starting
       | from main!
       | https://github.com/tigerbeetle/tigerbeetle/blob/main/src/tig...).
        
       ___________________________________________________________________
       (page generated 2024-07-24 23:03 UTC)