[HN Gopher] Jepsen: Amazon RDS for PostgreSQL 17.4
___________________________________________________________________
Jepsen: Amazon RDS for PostgreSQL 17.4
Author : aphyr
Score : 175 points
Date : 2025-04-29 14:30 UTC (8 hours ago)
(HTM) web link (jepsen.io)
(TXT) w3m dump (jepsen.io)
| henning wrote:
| I thought this kind of bullshit was only supposed to happen in
| MongoDB!
| kabes wrote:
| Then you haven't read enough jepsen reports. Distributed system
| guarantees generally can't be trusted
| __alexs wrote:
| Postgres is not a distributed system in this configuration
| usually though is it?
| semiquaver wrote:
| The result is for "Amazon RDS for PostgreSQL multi-AZ
| clusters" which are certainly a distributed system.
|
| I'm not well versed in RDS but I believe that clustered is
| the only way to use it.
| NewJazz wrote:
| No, you can have single instances
| reissbaker wrote:
| This writeup tested multi-AZ RDS for Postgres -- which is
| always distributed behind the scenes (otherwise, it
| couldn't exist in multiple AZs).
| dragonwriter wrote:
| An RDS cluster can have a single instance (but it can't
| be multi-AZ with a single instance.)
| dragonwriter wrote:
| A multi-AZ cluster is necessarily a distributed system.
| colesantiago wrote:
| Do people still use MongoDB in production?
|
| I was quite surprised to read that Stripe uses MongoDB in the
| early days and still today and I can't imagine the sheer
| nightmares they must have faced using it for all these years.
| colechristensen wrote:
| mongodb is a public company with a market cap of 14.2 billion
| dollars. so yes, people still use it in production
| djfivyvusn wrote:
| I've been looking for a job the last few weeks.
|
| Literally the only job ad I've seen talking about MongoDB
| was a job ad for MongoDB itself.
| senderista wrote:
| MongoDB has come a long way. They acquired a world-class
| storage engine (WiredTiger) and then they hired some world-
| class distsys people (e.g. Murat Demirbas). They might still
| be hamstrung by early design and API choices but from what I
| can tell (never used it in anger) the implementation is
| pretty solid.
| computerfan494 wrote:
| MongoDB is a very good database, and these days at scale I am
| significantly more confident in its correctness guarantees
| than any of the half-baked Postgres horizontal scaling
| solutions. I have run both databases at seven figure a month
| spend scale, and I would not choose off-the-shelf Postgres
| for this task again.
| bananapub wrote:
| I think zookeeper is still the only distributed system that got
| through jepsen without dataloss bugs, though at high cost:
| https://aphyr.com/posts/291-jepsen-zookeeper
| robterrell wrote:
| Didn't FoundationDB get a clean bill of health?
| MarkMarine wrote:
| wasn't tested because: "haven't tested foundation in part
| because their testing appears to be waaaay more rigorous
| than mine."
|
| https://web.archive.org/web/20150312112552/http://blog.foun
| d...
| bananapub wrote:
| apparently wasn't tested because Kyle thought the internal
| testing was better than jepsen itself:
| https://abdullin.com/foundationdb-is-back/
| necubi wrote:
| Aphyr didn't test foundation himself, but the foundation
| team did their own Jepsen testing which they reported
| passing. All of this was a long time ago, before Foundation
| was bought by Apple and open sourced.
|
| Now members of the original Foundation team have started
| Antithesis (https://antithesis.com/) to make it easier for
| other systems to adopt this sort of testing.
| Thaxll wrote:
| Those memes are 10 years old, you know that some very tech
| company use MongoDB right? We're talking billions a year.
| djfivyvusn wrote:
| What is your point?
| tibbar wrote:
| The submitted title buries the lede: RDS for PostgreSQL 17.4 does
| not properly implement snapshot isolation.
| belter wrote:
| And your comment also...In Multi-AZ clusters.
|
| Well this is from Kyle Kingsbury, the Chuck Norris of
| transactional guarantees. AWS has to reply or clarify, even if
| only seems to apply to Multi-AZ Clusters. Those are one of the
| two possibilities for RDS with Postgres. Multi-AZ deployments
| can have one standby or two standby DB instances and this is
| for the two standby DB instances. [1]
|
| They make no such promises in their documentation. Their 5494
| pages manual on RDS hardly mentions isolation or serializable
| except in documentation of parameters for the different
| engines.
|
| Nothing on global read consistency for Multi-AZ clusters
| because why should they.... :-) They talk about semi-
| synchronous replication so the writer waits for one standby to
| confirm log record, but the two readers can be on different
| snapshots?
|
| [1] - "New Amazon RDS for MySQL & PostgreSQL Multi-AZ
| Deployment Option: Improved Write Performance & Faster
| Failover" - https://aws.amazon.com/blogs/aws/amazon-rds-multi-
| az-db-clus...
|
| [2] - "Amazon RDS Multi-AZ with two readable standbys: Under
| the hood" - https://aws.amazon.com/blogs/database/amazon-rds-
| multi-az-wi...
| n2d4 wrote:
| > They make no such promises in their documentation. Their
| 5494 pages manual on RDS hardly mentions isolation or
| serializable
|
| Well, as a user, I wish they would mention it though, because
| if I migrate to RDS with multi-AZ after coming from plain
| Postgres, I would probably want to know how the two differ.
| If I have code that relies on snapshot isolation for
| repeatable reads (which normal pg has & clearly documents as
| such), I would want to know that this does not hold here.
| gymbeaux wrote:
| Par for the course
| altairprime wrote:
| I emailed the mods and asked them to change it to this phrase
| copy-pasted from the linked article:
|
| > Amazon RDS for PostgreSQL multi-AZ clusters violate Snapshot
| Isolation
| cr3ative wrote:
| This is in such a thick academic style that it is difficult to
| follow what the problem actually might be and how it would impact
| someone. This style of writing serves mostly to remind me that I
| am not a part of the world that writes like this, which makes me
| a little sad.
| glutamate wrote:
| In the beginning, when you read papers like this, it can be
| hard work. You can either give up or put some effort in to try
| to understand it. Maybe look at some of the other Jepsen
| reports, some may be easier. Or perhaps an introductory CS
| textbook. With practice and patience it will become easier to
| read and eventually write like this.
|
| You may not be part of that world now, but you can be some day.
|
| EDIT: forgot to say, i had to read 6 or 7 books on Bayesian
| statistics before i understood the most basic concepts. A few
| years later i wrote a compiler for a statistical programming
| language.
| cr3ative wrote:
| I'll look to do so, and appreciate your pointers. Thank you
| for being kind!
| concerndc1tizen wrote:
| The state of the art is always advancing, which greatly
| increases the burden of starting from first principles.
|
| I somewhat feel that there was a generation that had it
| easier, because they were pioneers in a new field, allowing
| them to become experts quickly, while improving year-on-year,
| being paid well in the process, and having great network and
| exposure.
|
| Of course, it can be done, but we should at least acknowledge
| that sometimes the industry is unforgiving and simply doesn't
| have on-ramps except for the privileged few.
| _AzMoo wrote:
| > I somewhat feel that there was a generation that had it
| easier
|
| I don't think so. I've been doing this for nearly 35 years
| now, and there's always been a lot to learn. Each layer of
| abstraction developed makes it easier to quickly iterate
| towards a new outcome faster or with more confidence, but
| hides away complexity that you might eventually need to
| know. In a lot of ways it's easier these days, because
| there's so much information available at your fingertips
| when you need it, presented in a multitude of different
| formats. I learned my first programming language by reading
| a QBasic textbook trying to debug a text-based adventure
| game that crashed at a critical moment. I had no Internet,
| no BBS, nobody to help, except my Dad who was a solo RPG
| programmer who had learned on the job after being promoted
| from sweeping floors in a warehouse.
| jorams wrote:
| It uses a lot of very specific terminology, but the linked
| pages like the one on "G-nonadjacent" do a lot to clear up what
| it all means. It _is_ a lot of reading.
|
| Essentially: The configuration claims "Snapshot Isolation",
| which means every transaction looks like it operates on a
| consistent snapshot of the entire database at its starting
| timestamp. All transactions starting after a transaction
| commits will see the changes made by the transaction. Jepsen
| finds that the snapshot a transaction sees doesn't always
| contain everything that was committed before its starting
| timestamp. Transactions A an B can both commit their changes,
| then transactions C and D can start with C only seeing the
| change made by A and D only seeing the change made by B.
| renewiltord wrote:
| It's maximal information communication. Use LLM to distill to
| your own knowledge level. It is trivial with modern LLM. Very
| good output in general.
| benatkin wrote:
| It addresses the reader no matter how knowledgeable they are.
| It's a very good use of hypertext, making it so that a
| knowledgeable reader won't need to skip over much.
| ZYbCRq22HbJ2y7 wrote:
| > such a thick academic style
|
| Why? Because it has variables and a graph?
|
| What sort of education background do you have?
| vlovich123 wrote:
| Have you tried using an LLM? I've found good results getting at
| the underlying concepts and building a mental model that works
| for me that way. It makes domain expertise - that often has
| unique terminology for concepts you already know or at least
| know without a specific name - more easily accessible after a
| little bit of a QA round.
| nijave wrote:
| It's not entirely clear but this isn't an issue in multi instance
| upstream Postgres clusters?
|
| Am I correct in understanding either AWS is doing something with
| the cluster configuration or has added some patches that
| introduce this behavior?
| belter wrote:
| Yes its different. This is a deeper overview of what they did:
| https://youtu.be/fLqJXTOhUg4
|
| Specially here: https://youtu.be/fLqJXTOhUg4?t=434
| ezekiel68 wrote:
| In my reading of this, it looks like the practical implication
| could be that reads happening quickly after writes to the same
| row(s) might return stale data. The write transaction gets marked
| as complete before all of the distributed layers of a multi AZ
| RDS instance have been fully updated, such that immediate reads
| from the same rows might return nothing (if the row does not
| exist yet) or older values if the columns have not been fully
| updated.
|
| Due to the way PostgreSQL does snapshotting, I don't believe this
| implies such a read might obtain a nonsense value due to only a
| portion of the bytes in a multi-byte column type having been
| updated yet.
|
| It seems like a race condition that becomes eventually
| consistent. Or did anyone read this as if the later
| transaction(s) of a "long fork" might never complete under normal
| circumstances?
| aphyr wrote:
| This isn't just stale data, in the sense of "a point-in-time
| consistent snapshot which does not reflect some recent
| transactions". I think what's going on here is that a read-only
| transaction against a secondary can observe some transaction T,
| but also _miss_ transactions which must have logically executed
| before T.
| mushufasa wrote:
| > These phenomena occurred in every version tested, from 13.15 to
| 17.4.
|
| I was worried I had made the wrong move upgrading major versions,
| but it looks like this is not that. This is not a regression, but
| just a feature request or longstanding bug.
| skywhopper wrote:
| This is an unfortunate report in a lot of ways. First, the title
| is incomplete. Second, there's no context as to the purpose of
| the test and very little about the parameters of the test. It
| makes no comparison to other PostgreSQL architectures except one
| reference at the end to a standalone system. Third, it
| characterizes the transaction isolation of this system as if it
| were a failure (see comments in this thread assuming this is a
| bug or a missing feature of Postgres). Finally, it never compares
| the promises made by the product vendors to the reality. Does AWS
| or Postgres promise perfect snapshot isolation?
|
| I understand the mission of the Jepsen project but presenting
| results in this format is misleading and will only sow confusion.
|
| Transaction isolation involves a ton of tradeoffs, and the
| tradeoffs chosen here may be fine for most use cases. The issues
| can be easily avoided by doing any critical transactional work
| against the primary read-write node only, which would be the only
| typical way in which transactional work would be done against a
| Postgres cluster of this sort.
| Sesse__ wrote:
| Postgres does indeed promise perfect snapshot isolation, and
| Amazon does not (to the best of my knowledge) document that
| their managed Postgres service weakens Postgres' promises.
| billiam wrote:
| New headline: AWS RDS is not CockroachDB or Spanner. And it's not
| trying to be.
| film42 wrote:
| I think AWS will need to update their documentation to
| communicate this. Will a snapshot isolation fix introduce a
| performance regression in latency or throughput? Or, maybe they
| stand by what they have as being strong enough. Either way,
| they'll need to say something.
| kevincox wrote:
| I think the ideal solution from AWS would be fixing the bug and
| actually providing the guarantees that the docs say that they
| do.
| oblio wrote:
| I wonder how Aurora fares on this?
___________________________________________________________________
(page generated 2025-04-29 23:00 UTC)