hngopher.com

       [HN Gopher] Jepsen: Amazon RDS for PostgreSQL 17.4
       ___________________________________________________________________
        
       Jepsen: Amazon RDS for PostgreSQL 17.4
        
       Author : aphyr
       Score  : 175 points
       Date   : 2025-04-29 14:30 UTC (8 hours ago)
        
 (HTM) web link (jepsen.io)
 (TXT) w3m dump (jepsen.io)
        
       | henning wrote:
       | I thought this kind of bullshit was only supposed to happen in
       | MongoDB!
        
         | kabes wrote:
         | Then you haven't read enough jepsen reports. Distributed system
         | guarantees generally can't be trusted
        
           | __alexs wrote:
           | Postgres is not a distributed system in this configuration
           | usually though is it?
        
             | semiquaver wrote:
             | The result is for "Amazon RDS for PostgreSQL multi-AZ
             | clusters" which are certainly a distributed system.
             | 
             | I'm not well versed in RDS but I believe that clustered is
             | the only way to use it.
        
               | NewJazz wrote:
               | No, you can have single instances
        
               | reissbaker wrote:
               | This writeup tested multi-AZ RDS for Postgres -- which is
               | always distributed behind the scenes (otherwise, it
               | couldn't exist in multiple AZs).
        
               | dragonwriter wrote:
               | An RDS cluster can have a single instance (but it can't
               | be multi-AZ with a single instance.)
        
             | dragonwriter wrote:
             | A multi-AZ cluster is necessarily a distributed system.
        
         | colesantiago wrote:
         | Do people still use MongoDB in production?
         | 
         | I was quite surprised to read that Stripe uses MongoDB in the
         | early days and still today and I can't imagine the sheer
         | nightmares they must have faced using it for all these years.
        
           | colechristensen wrote:
           | mongodb is a public company with a market cap of 14.2 billion
           | dollars. so yes, people still use it in production
        
             | djfivyvusn wrote:
             | I've been looking for a job the last few weeks.
             | 
             | Literally the only job ad I've seen talking about MongoDB
             | was a job ad for MongoDB itself.
        
           | senderista wrote:
           | MongoDB has come a long way. They acquired a world-class
           | storage engine (WiredTiger) and then they hired some world-
           | class distsys people (e.g. Murat Demirbas). They might still
           | be hamstrung by early design and API choices but from what I
           | can tell (never used it in anger) the implementation is
           | pretty solid.
        
           | computerfan494 wrote:
           | MongoDB is a very good database, and these days at scale I am
           | significantly more confident in its correctness guarantees
           | than any of the half-baked Postgres horizontal scaling
           | solutions. I have run both databases at seven figure a month
           | spend scale, and I would not choose off-the-shelf Postgres
           | for this task again.
        
         | bananapub wrote:
         | I think zookeeper is still the only distributed system that got
         | through jepsen without dataloss bugs, though at high cost:
         | https://aphyr.com/posts/291-jepsen-zookeeper
        
           | robterrell wrote:
           | Didn't FoundationDB get a clean bill of health?
        
             | MarkMarine wrote:
             | wasn't tested because: "haven't tested foundation in part
             | because their testing appears to be waaaay more rigorous
             | than mine."
             | 
             | https://web.archive.org/web/20150312112552/http://blog.foun
             | d...
        
             | bananapub wrote:
             | apparently wasn't tested because Kyle thought the internal
             | testing was better than jepsen itself:
             | https://abdullin.com/foundationdb-is-back/
        
             | necubi wrote:
             | Aphyr didn't test foundation himself, but the foundation
             | team did their own Jepsen testing which they reported
             | passing. All of this was a long time ago, before Foundation
             | was bought by Apple and open sourced.
             | 
             | Now members of the original Foundation team have started
             | Antithesis (https://antithesis.com/) to make it easier for
             | other systems to adopt this sort of testing.
        
         | Thaxll wrote:
         | Those memes are 10 years old, you know that some very tech
         | company use MongoDB right? We're talking billions a year.
        
           | djfivyvusn wrote:
           | What is your point?
        
       | tibbar wrote:
       | The submitted title buries the lede: RDS for PostgreSQL 17.4 does
       | not properly implement snapshot isolation.
        
         | belter wrote:
         | And your comment also...In Multi-AZ clusters.
         | 
         | Well this is from Kyle Kingsbury, the Chuck Norris of
         | transactional guarantees. AWS has to reply or clarify, even if
         | only seems to apply to Multi-AZ Clusters. Those are one of the
         | two possibilities for RDS with Postgres. Multi-AZ deployments
         | can have one standby or two standby DB instances and this is
         | for the two standby DB instances. [1]
         | 
         | They make no such promises in their documentation. Their 5494
         | pages manual on RDS hardly mentions isolation or serializable
         | except in documentation of parameters for the different
         | engines.
         | 
         | Nothing on global read consistency for Multi-AZ clusters
         | because why should they.... :-) They talk about semi-
         | synchronous replication so the writer waits for one standby to
         | confirm log record, but the two readers can be on different
         | snapshots?
         | 
         | [1] - "New Amazon RDS for MySQL & PostgreSQL Multi-AZ
         | Deployment Option: Improved Write Performance & Faster
         | Failover" - https://aws.amazon.com/blogs/aws/amazon-rds-multi-
         | az-db-clus...
         | 
         | [2] - "Amazon RDS Multi-AZ with two readable standbys: Under
         | the hood" - https://aws.amazon.com/blogs/database/amazon-rds-
         | multi-az-wi...
        
           | n2d4 wrote:
           | > They make no such promises in their documentation. Their
           | 5494 pages manual on RDS hardly mentions isolation or
           | serializable
           | 
           | Well, as a user, I wish they would mention it though, because
           | if I migrate to RDS with multi-AZ after coming from plain
           | Postgres, I would probably want to know how the two differ.
           | If I have code that relies on snapshot isolation for
           | repeatable reads (which normal pg has & clearly documents as
           | such), I would want to know that this does not hold here.
        
         | gymbeaux wrote:
         | Par for the course
        
         | altairprime wrote:
         | I emailed the mods and asked them to change it to this phrase
         | copy-pasted from the linked article:
         | 
         | > Amazon RDS for PostgreSQL multi-AZ clusters violate Snapshot
         | Isolation
        
       | cr3ative wrote:
       | This is in such a thick academic style that it is difficult to
       | follow what the problem actually might be and how it would impact
       | someone. This style of writing serves mostly to remind me that I
       | am not a part of the world that writes like this, which makes me
       | a little sad.
        
         | glutamate wrote:
         | In the beginning, when you read papers like this, it can be
         | hard work. You can either give up or put some effort in to try
         | to understand it. Maybe look at some of the other Jepsen
         | reports, some may be easier. Or perhaps an introductory CS
         | textbook. With practice and patience it will become easier to
         | read and eventually write like this.
         | 
         | You may not be part of that world now, but you can be some day.
         | 
         | EDIT: forgot to say, i had to read 6 or 7 books on Bayesian
         | statistics before i understood the most basic concepts. A few
         | years later i wrote a compiler for a statistical programming
         | language.
        
           | cr3ative wrote:
           | I'll look to do so, and appreciate your pointers. Thank you
           | for being kind!
        
           | concerndc1tizen wrote:
           | The state of the art is always advancing, which greatly
           | increases the burden of starting from first principles.
           | 
           | I somewhat feel that there was a generation that had it
           | easier, because they were pioneers in a new field, allowing
           | them to become experts quickly, while improving year-on-year,
           | being paid well in the process, and having great network and
           | exposure.
           | 
           | Of course, it can be done, but we should at least acknowledge
           | that sometimes the industry is unforgiving and simply doesn't
           | have on-ramps except for the privileged few.
        
             | _AzMoo wrote:
             | > I somewhat feel that there was a generation that had it
             | easier
             | 
             | I don't think so. I've been doing this for nearly 35 years
             | now, and there's always been a lot to learn. Each layer of
             | abstraction developed makes it easier to quickly iterate
             | towards a new outcome faster or with more confidence, but
             | hides away complexity that you might eventually need to
             | know. In a lot of ways it's easier these days, because
             | there's so much information available at your fingertips
             | when you need it, presented in a multitude of different
             | formats. I learned my first programming language by reading
             | a QBasic textbook trying to debug a text-based adventure
             | game that crashed at a critical moment. I had no Internet,
             | no BBS, nobody to help, except my Dad who was a solo RPG
             | programmer who had learned on the job after being promoted
             | from sweeping floors in a warehouse.
        
         | jorams wrote:
         | It uses a lot of very specific terminology, but the linked
         | pages like the one on "G-nonadjacent" do a lot to clear up what
         | it all means. It _is_ a lot of reading.
         | 
         | Essentially: The configuration claims "Snapshot Isolation",
         | which means every transaction looks like it operates on a
         | consistent snapshot of the entire database at its starting
         | timestamp. All transactions starting after a transaction
         | commits will see the changes made by the transaction. Jepsen
         | finds that the snapshot a transaction sees doesn't always
         | contain everything that was committed before its starting
         | timestamp. Transactions A an B can both commit their changes,
         | then transactions C and D can start with C only seeing the
         | change made by A and D only seeing the change made by B.
        
         | renewiltord wrote:
         | It's maximal information communication. Use LLM to distill to
         | your own knowledge level. It is trivial with modern LLM. Very
         | good output in general.
        
           | benatkin wrote:
           | It addresses the reader no matter how knowledgeable they are.
           | It's a very good use of hypertext, making it so that a
           | knowledgeable reader won't need to skip over much.
        
         | ZYbCRq22HbJ2y7 wrote:
         | > such a thick academic style
         | 
         | Why? Because it has variables and a graph?
         | 
         | What sort of education background do you have?
        
         | vlovich123 wrote:
         | Have you tried using an LLM? I've found good results getting at
         | the underlying concepts and building a mental model that works
         | for me that way. It makes domain expertise - that often has
         | unique terminology for concepts you already know or at least
         | know without a specific name - more easily accessible after a
         | little bit of a QA round.
        
       | nijave wrote:
       | It's not entirely clear but this isn't an issue in multi instance
       | upstream Postgres clusters?
       | 
       | Am I correct in understanding either AWS is doing something with
       | the cluster configuration or has added some patches that
       | introduce this behavior?
        
         | belter wrote:
         | Yes its different. This is a deeper overview of what they did:
         | https://youtu.be/fLqJXTOhUg4
         | 
         | Specially here: https://youtu.be/fLqJXTOhUg4?t=434
        
       | ezekiel68 wrote:
       | In my reading of this, it looks like the practical implication
       | could be that reads happening quickly after writes to the same
       | row(s) might return stale data. The write transaction gets marked
       | as complete before all of the distributed layers of a multi AZ
       | RDS instance have been fully updated, such that immediate reads
       | from the same rows might return nothing (if the row does not
       | exist yet) or older values if the columns have not been fully
       | updated.
       | 
       | Due to the way PostgreSQL does snapshotting, I don't believe this
       | implies such a read might obtain a nonsense value due to only a
       | portion of the bytes in a multi-byte column type having been
       | updated yet.
       | 
       | It seems like a race condition that becomes eventually
       | consistent. Or did anyone read this as if the later
       | transaction(s) of a "long fork" might never complete under normal
       | circumstances?
        
         | aphyr wrote:
         | This isn't just stale data, in the sense of "a point-in-time
         | consistent snapshot which does not reflect some recent
         | transactions". I think what's going on here is that a read-only
         | transaction against a secondary can observe some transaction T,
         | but also _miss_ transactions which must have logically executed
         | before T.
        
       | mushufasa wrote:
       | > These phenomena occurred in every version tested, from 13.15 to
       | 17.4.
       | 
       | I was worried I had made the wrong move upgrading major versions,
       | but it looks like this is not that. This is not a regression, but
       | just a feature request or longstanding bug.
        
       | skywhopper wrote:
       | This is an unfortunate report in a lot of ways. First, the title
       | is incomplete. Second, there's no context as to the purpose of
       | the test and very little about the parameters of the test. It
       | makes no comparison to other PostgreSQL architectures except one
       | reference at the end to a standalone system. Third, it
       | characterizes the transaction isolation of this system as if it
       | were a failure (see comments in this thread assuming this is a
       | bug or a missing feature of Postgres). Finally, it never compares
       | the promises made by the product vendors to the reality. Does AWS
       | or Postgres promise perfect snapshot isolation?
       | 
       | I understand the mission of the Jepsen project but presenting
       | results in this format is misleading and will only sow confusion.
       | 
       | Transaction isolation involves a ton of tradeoffs, and the
       | tradeoffs chosen here may be fine for most use cases. The issues
       | can be easily avoided by doing any critical transactional work
       | against the primary read-write node only, which would be the only
       | typical way in which transactional work would be done against a
       | Postgres cluster of this sort.
        
         | Sesse__ wrote:
         | Postgres does indeed promise perfect snapshot isolation, and
         | Amazon does not (to the best of my knowledge) document that
         | their managed Postgres service weakens Postgres' promises.
        
       | billiam wrote:
       | New headline: AWS RDS is not CockroachDB or Spanner. And it's not
       | trying to be.
        
       | film42 wrote:
       | I think AWS will need to update their documentation to
       | communicate this. Will a snapshot isolation fix introduce a
       | performance regression in latency or throughput? Or, maybe they
       | stand by what they have as being strong enough. Either way,
       | they'll need to say something.
        
         | kevincox wrote:
         | I think the ideal solution from AWS would be fixing the bug and
         | actually providing the guarantees that the docs say that they
         | do.
        
       | oblio wrote:
       | I wonder how Aurora fares on this?
        
       ___________________________________________________________________
       (page generated 2025-04-29 23:00 UTC)