hngopher.com

       [HN Gopher] Pgactive: Postgres active-active replication extension
       ___________________________________________________________________
        
       Pgactive: Postgres active-active replication extension
        
       Author : ForHackernews
       Score  : 243 points
       Date   : 2025-07-16 09:15 UTC (13 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | zknill wrote:
       | Looks like it uses Postgres Logical replication to share changes
       | made on one postgres instance to another. Conflict resolution is
       | last-write-wins based on timestamp. Conflicting transactions are
       | logged to a special table (pgactive_conflict_history), so you can
       | see the history, resolve, etc.
       | 
       | https://github.com/aws/pgactive/tree/main/docs
        
         | kosolam wrote:
         | Sounds interesting. So how soon one knows if his write has been
         | accepted or rejected? Is it immediate or eventual?
        
           | ForHackernews wrote:
           | It's eventual consistency: Latest-write wins after the dust
           | settles.
           | 
           | As I understand it, this is a wrapper on top of Postgres'
           | native logical replication features. Writes are committed
           | locally and then published via a replication slot to
           | subscriber nodes. You have ACID guarantees locally, but not
           | across the entire distributed system.
           | 
           | https://www.postgresql.org/docs/current/logical-
           | replication....
        
             | gritzko wrote:
             | So the outcomes are essentially random?
             | 
             | It all feels like they expect developers to sift through
             | the conflict log to resolve things manually or something.
             | If a transaction did not go through on some of the nodes,
             | what are the others doing then? What if they can not roll
             | it back safely?
             | 
             | Such a rabbit hole.
        
               | zknill wrote:
               | Typically applications will have some kind of logical
               | separation of the data.
               | 
               | Given this is targeted at replication of postgres nodes,
               | perhaps the nodes are deployed across different regions
               | of the globe.
               | 
               | By using active-active replication, all the participating
               | nodes are capable of accepting writes, which simplifies
               | the deployment and querying of postgres (you can read and
               | write to your region-local postgres node).
               | 
               | Now that doesn't mean that all the reads and writes will
               | be on conflicting data. Take the regional example,
               | perhaps the majority of the writes affecting one region's
               | data are made _in that region_. In this case, the region
               | local postgres would be performing all the conflict
               | resolution locally, and sharing the updates with the
               | other nodes.
               | 
               | The reason this simplifies things, is that you can treat
               | all your postgres connections as-if they are just a
               | single postgres. Writes are fast, because they are
               | accepted in the local region, and reads are replicated
               | without you having to have a dedicated read-replica.
               | 
               | Ofc you're still going to have to design around the
               | conflict resolution (i.e. writes for the same data issued
               | against different instances), and the possibility of
               | stale reads as the data is replicated cross-node. But for
               | some applications, this design might be a significant
               | benefit, even with the extra things you need to do.
        
               | gritzko wrote:
               | I think I understand the use case. Like, we have in fact
               | several regional Postgreses, but we want them to be one
               | physical database for the sake of simplicity. Probably
               | this should be in the motivational part of the README.
        
               | ForHackernews wrote:
               | In our case, we're designing around INSERT-only tables
               | with a composite primary key that includes the site id,
               | so (in theory) there will never be any conflicts that
               | need resolution.
        
               | zozbot234 wrote:
               | > with a composite primary key that includes the site id
               | 
               | It doesn't look like you'd need multi master replication
               | in that case? You could simply partition tables by site
               | and rely on logical replication.
        
               | ForHackernews wrote:
               | I think that's absolutely true in the happy scenario when
               | the internet is up.
               | 
               | There's a requirement that during outages each site
               | continue operating independently and might* need to make
               | writes to data "outside" its normal partition. By having
               | active-active replication the hope is that the whole
               | thing recovers "automatically" (famous last words) to a
               | consistent state once the network comes back.
        
               | teraflop wrote:
               | But if you drop the assumption that each site only writes
               | rows prefixed with its site ID, then you're right back to
               | the original situation where writes can be silently
               | overwritten.
               | 
               | Do you consider that acceptable, or don't you?
        
               | LudwigNagasena wrote:
               | Sounds like a recipe for a split brain that requires
               | manual recovery and reconciliation.
        
               | zozbot234 wrote:
               | You could implement a CRDT and partially automate that
               | "recovery and reconciliation" workflow.
        
               | ForHackernews wrote:
               | That's correct: when the network comes back up we'll
               | present users with a diff view and they can reconcile
               | manually or decide to drop the revision they don't care
               | about.
               | 
               | We're expecting this to be a rare occurrence (during
               | partition, user at site A needs to modify data sourced
               | from B). It doesn't have to be trivially easy for us to
               | recover from, only possible.
        
               | shermantanktop wrote:
               | There's no free lunch. The rabbit hole is only worth
               | going down if the benefits are worth the operational
               | pain. I view this as a building block, not a checkbox
               | feature that magically just works all the time.
               | 
               | For someone who has these requirements out of the gate,
               | another datastore might be better. But if someone is
               | already deeply tied to Postgres and perhaps doing their
               | own half assed version of this, this option could be
               | great.
        
               | ForHackernews wrote:
               | What are good off-the-shelf distributed databases? We
               | looked at MongoDB but it wasn't worth giving up SQL. To
               | reiterate the no free lunch point, no one has figured out
               | how to outsmart the CAP theorem yet, so all you can do is
               | design around it.
        
               | perfmode wrote:
               | Spanner
        
               | whizzter wrote:
               | My guess is that you want to change your entire design
               | philosophy a little bit with regards to table design,
               | moving some entities to use a composite GUID+timestamp as
               | PK's and replace most updates with inserts to avoid
               | conflicts and instead resolve things at query-time
               | (Basically a CRDT modelling philosophy contained within a
               | relational schema).
               | 
               | Ideal? Not entirely but it should still give most query
               | benefits of regular SQL and allows one to to benefit from
               | good indexes (the proper indexes of an SQL database will
               | also help contain the costs of an updated datamodel).
               | 
               | I think this is more interesting for someone building
               | something social media like perhaps rather than anything
               | involving accounting.
        
               | rjbwork wrote:
               | Are there any Datomic-like query layers on top of
               | Postgres for approaches like this where you're recording
               | immutable occurrences rather than updating mutable
               | records?
        
               | zozbot234 wrote:
               | > So the outcomes are essentially random?
               | 
               | In principle you could use CRDTs to end up with a "not
               | quite random" outcome that simply takes the conflict into
               | account - it doesn't really attempt to "resolve" it.
               | That's quite good for some cases.
        
               | dboreham wrote:
               | This is a kind of CRDT. CRDT is just some papers defining
               | reasonably clear terminology to cover the kind of
               | eventually consistent replication that has been done for
               | decades, including this kind (timestamp-based last-writer
               | wins).
        
           | okigan wrote:
           | It took 20 years to acknowledge that pushing eventual
           | consistency to application layer is not worth it for most
           | applications.
           | 
           | Seems the same is playing out out in Postgres with this
           | extension, maybe will take it another 20 years
        
             | rubiquity wrote:
             | The idea of active-active is too seductive compared to how
             | hard learning distributed systems is.
        
               | okigan wrote:
               | It is so seductive that people don't read the footnotes
               | that explain that active-active does not do what they
               | think it does.
        
               | m11a wrote:
               | I'd agree. There's so many footguns involved in multi-
               | master setups, that most organisations should avoid this
               | until they're big enough to hire distributed systems
               | engineers to design a proper solution for the company. I
               | personally don't love any of the Postgres multi-master
               | solutions.
               | 
               | You can scale surprisingly far on a single-master
               | Postgres with read replicas.
        
             | tinkertamper wrote:
             | I'm curious about what you mean here. It sounds like you're
             | saying that applications shouldn't concern themselves with
             | consistency. Can you elaborate?
        
         | zozbot234 wrote:
         | Is this multi-master replication? It will be interesting if it
         | can be accepted into Postgres proper.
        
           | stephenr wrote:
           | Sounds like "yes, with an if" where the "if" is "if you don't
           | really care about data consistency".
           | 
           | "Last write wins" sounds like a recipe for disaster IMO.
           | 
           | This is still one of those things that keeps people on MySQL
           | - there are not one, but two open-source solutions available
           | that provide synchronous cluster replication, allowing for
           | "safe" writes against multiple primaries.
        
             | nyrikki wrote:
             | It's all tradeoffs, with MySQL multi-master and multi-
             | source models having their own issues and pg also has other
             | options with their own tradoffs.
             | 
             | ACID+distributed== tradoffs that will always keep this a
             | horses for courses problem.
        
             | wfn wrote:
             | Out of curiosity, what conflict resolution options exist in
             | mysql and/or mysql cluster (never checked / exp. in PG)?
             | Because you'll always have to address conflicts of course -
             | we come to CAP / PACELC. Hm [1][2] - looks like they
             | support more strategies (possibly) but I mean none of them
             | are somehow magical, and timestamp comparison based methods
             | comprise the better part of offered strategy set (looks
             | like?) - and "latest timestamp wins" at least used to be
             | the default (did not read thoroughly mind you, was just
             | curious)?
             | 
             | But I could be totally wrong - (1) curious if someone could
             | link to things / explain, and (2) fyi ('stephenr) last
             | write wins based on timestamp is a thing im mysql world as
             | well (though again maybe set of options / different
             | conflict resolution methods available is larger in mysql?)
             | 
             | [1]: https://dev.mysql.com/doc/refman/8.4/en/mysql-cluster-
             | replic...
             | 
             | [2]: https://dev.mysql.com/blog-archive/enhanced-conflict-
             | resolut... (nice writeup, maybe outdated idk?)
        
               | stephenr wrote:
               | For reference those two pages are both about NDB cluster.
               | 
               | The two "options" I was referring to are MySQL group
               | replication and the Galera replication plugin for MySQL.
               | Both provide synchronous replication, so the write either
               | succeeds to a majority of the cluster or is rejected.
        
           | dehrmann wrote:
           | Did Postgres ever get a built-in, blessed replication
           | offering? It's been a while since I set it up, but I remember
           | this was always a big missing feature compared to Mysql.
        
             | philipwhiuk wrote:
             | Streaming and logical replication is built in:
             | https://www.postgresql.org/docs/current/runtime-config-
             | repli...
        
             | wrs wrote:
             | The basic replication mechanisms have been built-in for
             | quite a while. What's not there is cluster management
             | (replica launching, leader election, load balancing, that
             | sort of thing) that makes it practical in a nontrivial
             | situation. There are several 3rd party solutions to that.
             | [0]
             | 
             | Same situation as, e.g., backups. You can just use pg_dump,
             | but to be serious you need a 3rd party solution that does
             | log shipping and so on.
             | 
             | [0] https://www.postgresql.org/download/products/3-clusteri
             | ngrep...
        
       | everfrustrated wrote:
       | I'm scratching my head trying to think why AWS would have worked
       | on this? I can't think of it being used in any of their products.
       | 
       | RDS uses block replication. Aurora uses it's own SAN replication
       | layer.
       | 
       | DMS maybe?
        
         | gotimo wrote:
         | probably [Aurora DSQL](https://aws.amazon.com/rds/aurora/dsql/)
         | which released a couple of weeks ago
        
           | fizx wrote:
           | I thought dsql was distributed transactions. :thinking:
        
             | mjb wrote:
             | DSQL does use distributed transactions to offer active-
             | active alongside strong consistency and isolation
             | guarantees. Here's how:
             | https://brooker.co.za/blog/2024/12/05/inside-dsql-
             | writes.htm...
        
           | loginatnine wrote:
           | It's definitely DSQL with the multi-region active active
           | feature[1].
           | 
           | [1]https://aws.amazon.com/rds/aurora/dsql/features/#topic-1
        
             | riknos314 wrote:
             | DSQL only uses Postgres for the query processor layer, so
             | it doesn't require a replication library within postgres
             | itself. Definitely NOT from DSQL.
             | 
             | > We're not using any of the storage or transaction
             | processing parts of PostgreSQL, but are using the SQL
             | engine, an adapted version of the planner and optimizer,
             | and the client protocol implementation. [1]
             | 
             | Rather, DSQL seems to do its region replication using the
             | distributed journal abstraction [2].
             | 
             | [1] https://brooker.co.za/blog/2024/12/04/inside-dsql.html
             | [2] https://brooker.co.za/blog/2024/12/06/inside-dsql-
             | cap.html
        
           | tneely wrote:
           | DSQL uses an internal service, Journal[1], for its mutli-
           | region writes.
           | 
           | [1]: https://www.allthingsdistributed.com/2025/05/just-make-
           | it-sc...
        
         | the_precipitate wrote:
         | Yeah, and this doesn't seems to be that useful. At least I
         | don't understand why one should do this on a strong ACID
         | relational database.
        
           | hobs wrote:
           | In my experience multi-writer is because of latency or HADR
           | stuff - have all your data in all regions at the same time,
           | but the method (via the tlog) seems like it sort of defeats
           | what those sorts of systems might be able to historically do
           | (write multiple places from the app at the same time so as to
           | have the lowest possible chance of data loss.)
        
             | shermantanktop wrote:
             | Yes, I call it spoke-and-hub. The sharded spokes accept the
             | writes and replicate back to the hub, where all shards
             | coexist.
             | 
             | Useful for metric ingestion. Not useful for bank ledgers or
             | whatever.
        
               | hobs wrote:
               | Yeah, some instances also have all the shards maintain
               | all the state and just accept writes in your partition of
               | values, merge replication in sql server works like this.
        
         | ForHackernews wrote:
         | Apparently they've offered it as a feature in RDS Postgres for
         | a couple years now https://aws.amazon.com/about-aws/whats-
         | new/2023/10/pgactive-...
         | 
         | But only last month did they officially release it as open
         | source to the community https://aws-
         | news.com/article/2025-06-09-announcing-open-sour...
        
           | everfrustrated wrote:
           | Thanks, that seems to match.
        
           | 0xFF0123 wrote:
           | A lot of theorising above, but this seems like the earliest
           | reference
        
         | thayne wrote:
         | > Aurora uses it's own SAN replication layer
         | 
         | I don't think that is used for cross region replication
        
         | cfycom wrote:
         | From the readme of the repo: "Use cases for this include
         | running a Multi-Region high availability database cluster"
        
         | prdpx7 wrote:
         | i think AWS is using pgactive to sell "Aurora Postgres Global"
         | https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...
        
       | ltbarcly3 wrote:
       | Don't use this unless you know exactly what you are doing.
       | 
       | This is not a way to get better performance or scalability in
       | general.
        
       | dangoodmanUT wrote:
       | It seems async? That's a major problem for transaction isolation
        
         | dboreham wrote:
         | Pick your poison.
        
       | nico wrote:
       | Tangential, but related. Is there a way to have a "locally
       | writable" read replica, ie. a secondary db that reads from a
       | primary, but that can also hold local changes that doesn't send
       | back to the primary?
       | 
       | One of the use cases is to have a development db that can get
       | data from production or staging (and doesn't send local changes
       | back)
       | 
       | What I've done usually is have some script/cron/worker run
       | periodically to get data, either via dump or running some
       | queries, create a snapshot, store it in S3, then have a script on
       | the local dev code that gets the snapshot and inserts/restores
       | the data in the local db. This works for many cases, but index
       | building can be a pain (take a long time), depending on the data
        
         | AYBABTME wrote:
         | Just FYI that most people would recommend against doing this
         | for legal reasons. PII information and the likes are not
         | usually allowed to land in a staging or dev environment, for
         | various reasons. Doing this or allowing it, is a huge
         | liability.
        
           | nico wrote:
           | Agreed, and there's a few ways to deal with that, like not
           | including certain tables or excluding the data via queries
           | when creating the snapshot
           | 
           | Having said that, legal exposure and risk will highly depend
           | on what you are working on. Probably for most projects this
           | isn't a big deal. IANAL, this is not legal advice
        
         | xinu2020 wrote:
         | Curious about this - How would local writes conflicting with
         | remote updates be handled? I can't think of a merge strategy
         | working on all scenario (or even most of the time)
        
           | nico wrote:
           | Great question, I don't know. However, at least in my case, I
           | wouldn't mind the source data always overwriting the local
           | data. In fact, that's the way it works now when loading a
           | newer snapshot, the local db is dropped and then re-built
           | from the snapshot
           | 
           | Thinking about the developer experience though, when loading
           | a snapshot manually, the dev knows they are overwriting their
           | local db. However, if replication happened
           | automatically/continuously on the background, it could lead
           | to some really confusing/annoying behaviors
        
         | mdavidn wrote:
         | Load the snapshot to a "pristine" local database that you never
         | modify. Whenever you need a "reset" of your dev database, drop
         | it, then copy the pristine database using `createdb
         | --template`. This copies prebuilt indexes rather than rebuild
         | them, which is much faster.
        
           | nico wrote:
           | But when loading that pristine local db from the original
           | source, it would still create the indices and thus take a
           | long time?
           | 
           | The goal is not necessarily having an easy way to reset, but
           | rather an easy/quick way to load real data
        
         | ForHackernews wrote:
         | AFAIK that is the standard behavior with a Postgres logical
         | replication setup. There is nothing preventing you doing writes
         | on the replica, they just won't get sent back anywhere else.
        
       | eatonphil wrote:
       | A bit of the history as I've been told by 2nd Quadrant/EDB people
       | (my teammates):
       | 
       | BDR1 [0] came first and was, and is, open source. pgactive is
       | based on BDR1. BDR2 was a closed-source rewrite of BDR1 that was
       | later abandoned.
       | 
       | pglogical v1 and v2 (PGL1, PGL2) were, and are, open-source [1].
       | 
       | pglogical v1, after heavy modification, was eventually merged
       | into Postgres 10.
       | 
       | Based on learnings from this logical replication in Postgres 10,
       | 2nd Quadrant started pglogical v2.
       | 
       | pgEdge is based on pglogical v2.
       | 
       | Then later 2nd Quadrant started pglogical v3 (closed source) and
       | BDR v3 (closed source). They were merged into just BDR v4. At
       | some point the BDR product was renamed to Postgres Distributed
       | (PGD) [2].
       | 
       | 2ndQuadrant was acquired by EDB. We (EDB) just released PGD v6.
       | 
       | [0] https://github.com/2ndQuadrant/bdr/tree/bdr-
       | plugin/REL1_0_ST...
       | 
       | [1] https://github.com/2ndquadrant/pglogical
       | 
       | [2] https://www.enterprisedb.com/docs/pgd/latest/
        
         | cbsmith wrote:
         | PGDv6 is still closed source, yeah?
        
           | eatonphil wrote:
           | That's right.
        
         | x4m wrote:
         | And there's also
         | https://postgrespro.com/docs/enterprise/current/multimaster
         | With a history behind it.
        
           | eatonphil wrote:
           | Right, I'm not so familiar with it but from that page:
           | 
           | > The replication mechanism is based on logical decoding and
           | an earlier version of the pglogical extension provided for
           | community by the 2ndQuadrant team.
        
       | dang wrote:
       | Related. Others?
       | 
       |  _Pgactive: Active-Active Replication Extension for PostgreSQL on
       | Amazon RDS_ - https://news.ycombinator.com/item?id=37838223 - Oct
       | 2023 (1 comment)
        
       | iotapi322 wrote:
       | After setting up numerous clusters with repmgr and patroni along
       | with running them in zero down time production... This is the
       | very last plugin i would ever install. I like to sleep at night.
        
       | ahachete wrote:
       | I'm not tired of reminding everyone that "conflict resolution" is
       | no more than an euphemism for "breaking durability by dropping
       | already committed and acknowledged data".
       | 
       | Either architect for no data overlap on writes across all the
       | "actives" (in which case software like pgactive could be a good
       | deal) or use a purely distributed database (like Yugabyte).
        
       ___________________________________________________________________
       (page generated 2025-07-16 23:00 UTC)