[HN Gopher] Pgactive: Postgres active-active replication extension
___________________________________________________________________
Pgactive: Postgres active-active replication extension
Author : ForHackernews
Score : 243 points
Date : 2025-07-16 09:15 UTC (13 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| zknill wrote:
| Looks like it uses Postgres Logical replication to share changes
| made on one postgres instance to another. Conflict resolution is
| last-write-wins based on timestamp. Conflicting transactions are
| logged to a special table (pgactive_conflict_history), so you can
| see the history, resolve, etc.
|
| https://github.com/aws/pgactive/tree/main/docs
| kosolam wrote:
| Sounds interesting. So how soon one knows if his write has been
| accepted or rejected? Is it immediate or eventual?
| ForHackernews wrote:
| It's eventual consistency: Latest-write wins after the dust
| settles.
|
| As I understand it, this is a wrapper on top of Postgres'
| native logical replication features. Writes are committed
| locally and then published via a replication slot to
| subscriber nodes. You have ACID guarantees locally, but not
| across the entire distributed system.
|
| https://www.postgresql.org/docs/current/logical-
| replication....
| gritzko wrote:
| So the outcomes are essentially random?
|
| It all feels like they expect developers to sift through
| the conflict log to resolve things manually or something.
| If a transaction did not go through on some of the nodes,
| what are the others doing then? What if they can not roll
| it back safely?
|
| Such a rabbit hole.
| zknill wrote:
| Typically applications will have some kind of logical
| separation of the data.
|
| Given this is targeted at replication of postgres nodes,
| perhaps the nodes are deployed across different regions
| of the globe.
|
| By using active-active replication, all the participating
| nodes are capable of accepting writes, which simplifies
| the deployment and querying of postgres (you can read and
| write to your region-local postgres node).
|
| Now that doesn't mean that all the reads and writes will
| be on conflicting data. Take the regional example,
| perhaps the majority of the writes affecting one region's
| data are made _in that region_. In this case, the region
| local postgres would be performing all the conflict
| resolution locally, and sharing the updates with the
| other nodes.
|
| The reason this simplifies things, is that you can treat
| all your postgres connections as-if they are just a
| single postgres. Writes are fast, because they are
| accepted in the local region, and reads are replicated
| without you having to have a dedicated read-replica.
|
| Ofc you're still going to have to design around the
| conflict resolution (i.e. writes for the same data issued
| against different instances), and the possibility of
| stale reads as the data is replicated cross-node. But for
| some applications, this design might be a significant
| benefit, even with the extra things you need to do.
| gritzko wrote:
| I think I understand the use case. Like, we have in fact
| several regional Postgreses, but we want them to be one
| physical database for the sake of simplicity. Probably
| this should be in the motivational part of the README.
| ForHackernews wrote:
| In our case, we're designing around INSERT-only tables
| with a composite primary key that includes the site id,
| so (in theory) there will never be any conflicts that
| need resolution.
| zozbot234 wrote:
| > with a composite primary key that includes the site id
|
| It doesn't look like you'd need multi master replication
| in that case? You could simply partition tables by site
| and rely on logical replication.
| ForHackernews wrote:
| I think that's absolutely true in the happy scenario when
| the internet is up.
|
| There's a requirement that during outages each site
| continue operating independently and might* need to make
| writes to data "outside" its normal partition. By having
| active-active replication the hope is that the whole
| thing recovers "automatically" (famous last words) to a
| consistent state once the network comes back.
| teraflop wrote:
| But if you drop the assumption that each site only writes
| rows prefixed with its site ID, then you're right back to
| the original situation where writes can be silently
| overwritten.
|
| Do you consider that acceptable, or don't you?
| LudwigNagasena wrote:
| Sounds like a recipe for a split brain that requires
| manual recovery and reconciliation.
| zozbot234 wrote:
| You could implement a CRDT and partially automate that
| "recovery and reconciliation" workflow.
| ForHackernews wrote:
| That's correct: when the network comes back up we'll
| present users with a diff view and they can reconcile
| manually or decide to drop the revision they don't care
| about.
|
| We're expecting this to be a rare occurrence (during
| partition, user at site A needs to modify data sourced
| from B). It doesn't have to be trivially easy for us to
| recover from, only possible.
| shermantanktop wrote:
| There's no free lunch. The rabbit hole is only worth
| going down if the benefits are worth the operational
| pain. I view this as a building block, not a checkbox
| feature that magically just works all the time.
|
| For someone who has these requirements out of the gate,
| another datastore might be better. But if someone is
| already deeply tied to Postgres and perhaps doing their
| own half assed version of this, this option could be
| great.
| ForHackernews wrote:
| What are good off-the-shelf distributed databases? We
| looked at MongoDB but it wasn't worth giving up SQL. To
| reiterate the no free lunch point, no one has figured out
| how to outsmart the CAP theorem yet, so all you can do is
| design around it.
| perfmode wrote:
| Spanner
| whizzter wrote:
| My guess is that you want to change your entire design
| philosophy a little bit with regards to table design,
| moving some entities to use a composite GUID+timestamp as
| PK's and replace most updates with inserts to avoid
| conflicts and instead resolve things at query-time
| (Basically a CRDT modelling philosophy contained within a
| relational schema).
|
| Ideal? Not entirely but it should still give most query
| benefits of regular SQL and allows one to to benefit from
| good indexes (the proper indexes of an SQL database will
| also help contain the costs of an updated datamodel).
|
| I think this is more interesting for someone building
| something social media like perhaps rather than anything
| involving accounting.
| rjbwork wrote:
| Are there any Datomic-like query layers on top of
| Postgres for approaches like this where you're recording
| immutable occurrences rather than updating mutable
| records?
| zozbot234 wrote:
| > So the outcomes are essentially random?
|
| In principle you could use CRDTs to end up with a "not
| quite random" outcome that simply takes the conflict into
| account - it doesn't really attempt to "resolve" it.
| That's quite good for some cases.
| dboreham wrote:
| This is a kind of CRDT. CRDT is just some papers defining
| reasonably clear terminology to cover the kind of
| eventually consistent replication that has been done for
| decades, including this kind (timestamp-based last-writer
| wins).
| okigan wrote:
| It took 20 years to acknowledge that pushing eventual
| consistency to application layer is not worth it for most
| applications.
|
| Seems the same is playing out out in Postgres with this
| extension, maybe will take it another 20 years
| rubiquity wrote:
| The idea of active-active is too seductive compared to how
| hard learning distributed systems is.
| okigan wrote:
| It is so seductive that people don't read the footnotes
| that explain that active-active does not do what they
| think it does.
| m11a wrote:
| I'd agree. There's so many footguns involved in multi-
| master setups, that most organisations should avoid this
| until they're big enough to hire distributed systems
| engineers to design a proper solution for the company. I
| personally don't love any of the Postgres multi-master
| solutions.
|
| You can scale surprisingly far on a single-master
| Postgres with read replicas.
| tinkertamper wrote:
| I'm curious about what you mean here. It sounds like you're
| saying that applications shouldn't concern themselves with
| consistency. Can you elaborate?
| zozbot234 wrote:
| Is this multi-master replication? It will be interesting if it
| can be accepted into Postgres proper.
| stephenr wrote:
| Sounds like "yes, with an if" where the "if" is "if you don't
| really care about data consistency".
|
| "Last write wins" sounds like a recipe for disaster IMO.
|
| This is still one of those things that keeps people on MySQL
| - there are not one, but two open-source solutions available
| that provide synchronous cluster replication, allowing for
| "safe" writes against multiple primaries.
| nyrikki wrote:
| It's all tradeoffs, with MySQL multi-master and multi-
| source models having their own issues and pg also has other
| options with their own tradoffs.
|
| ACID+distributed== tradoffs that will always keep this a
| horses for courses problem.
| wfn wrote:
| Out of curiosity, what conflict resolution options exist in
| mysql and/or mysql cluster (never checked / exp. in PG)?
| Because you'll always have to address conflicts of course -
| we come to CAP / PACELC. Hm [1][2] - looks like they
| support more strategies (possibly) but I mean none of them
| are somehow magical, and timestamp comparison based methods
| comprise the better part of offered strategy set (looks
| like?) - and "latest timestamp wins" at least used to be
| the default (did not read thoroughly mind you, was just
| curious)?
|
| But I could be totally wrong - (1) curious if someone could
| link to things / explain, and (2) fyi ('stephenr) last
| write wins based on timestamp is a thing im mysql world as
| well (though again maybe set of options / different
| conflict resolution methods available is larger in mysql?)
|
| [1]: https://dev.mysql.com/doc/refman/8.4/en/mysql-cluster-
| replic...
|
| [2]: https://dev.mysql.com/blog-archive/enhanced-conflict-
| resolut... (nice writeup, maybe outdated idk?)
| stephenr wrote:
| For reference those two pages are both about NDB cluster.
|
| The two "options" I was referring to are MySQL group
| replication and the Galera replication plugin for MySQL.
| Both provide synchronous replication, so the write either
| succeeds to a majority of the cluster or is rejected.
| dehrmann wrote:
| Did Postgres ever get a built-in, blessed replication
| offering? It's been a while since I set it up, but I remember
| this was always a big missing feature compared to Mysql.
| philipwhiuk wrote:
| Streaming and logical replication is built in:
| https://www.postgresql.org/docs/current/runtime-config-
| repli...
| wrs wrote:
| The basic replication mechanisms have been built-in for
| quite a while. What's not there is cluster management
| (replica launching, leader election, load balancing, that
| sort of thing) that makes it practical in a nontrivial
| situation. There are several 3rd party solutions to that.
| [0]
|
| Same situation as, e.g., backups. You can just use pg_dump,
| but to be serious you need a 3rd party solution that does
| log shipping and so on.
|
| [0] https://www.postgresql.org/download/products/3-clusteri
| ngrep...
| everfrustrated wrote:
| I'm scratching my head trying to think why AWS would have worked
| on this? I can't think of it being used in any of their products.
|
| RDS uses block replication. Aurora uses it's own SAN replication
| layer.
|
| DMS maybe?
| gotimo wrote:
| probably [Aurora DSQL](https://aws.amazon.com/rds/aurora/dsql/)
| which released a couple of weeks ago
| fizx wrote:
| I thought dsql was distributed transactions. :thinking:
| mjb wrote:
| DSQL does use distributed transactions to offer active-
| active alongside strong consistency and isolation
| guarantees. Here's how:
| https://brooker.co.za/blog/2024/12/05/inside-dsql-
| writes.htm...
| loginatnine wrote:
| It's definitely DSQL with the multi-region active active
| feature[1].
|
| [1]https://aws.amazon.com/rds/aurora/dsql/features/#topic-1
| riknos314 wrote:
| DSQL only uses Postgres for the query processor layer, so
| it doesn't require a replication library within postgres
| itself. Definitely NOT from DSQL.
|
| > We're not using any of the storage or transaction
| processing parts of PostgreSQL, but are using the SQL
| engine, an adapted version of the planner and optimizer,
| and the client protocol implementation. [1]
|
| Rather, DSQL seems to do its region replication using the
| distributed journal abstraction [2].
|
| [1] https://brooker.co.za/blog/2024/12/04/inside-dsql.html
| [2] https://brooker.co.za/blog/2024/12/06/inside-dsql-
| cap.html
| tneely wrote:
| DSQL uses an internal service, Journal[1], for its mutli-
| region writes.
|
| [1]: https://www.allthingsdistributed.com/2025/05/just-make-
| it-sc...
| the_precipitate wrote:
| Yeah, and this doesn't seems to be that useful. At least I
| don't understand why one should do this on a strong ACID
| relational database.
| hobs wrote:
| In my experience multi-writer is because of latency or HADR
| stuff - have all your data in all regions at the same time,
| but the method (via the tlog) seems like it sort of defeats
| what those sorts of systems might be able to historically do
| (write multiple places from the app at the same time so as to
| have the lowest possible chance of data loss.)
| shermantanktop wrote:
| Yes, I call it spoke-and-hub. The sharded spokes accept the
| writes and replicate back to the hub, where all shards
| coexist.
|
| Useful for metric ingestion. Not useful for bank ledgers or
| whatever.
| hobs wrote:
| Yeah, some instances also have all the shards maintain
| all the state and just accept writes in your partition of
| values, merge replication in sql server works like this.
| ForHackernews wrote:
| Apparently they've offered it as a feature in RDS Postgres for
| a couple years now https://aws.amazon.com/about-aws/whats-
| new/2023/10/pgactive-...
|
| But only last month did they officially release it as open
| source to the community https://aws-
| news.com/article/2025-06-09-announcing-open-sour...
| everfrustrated wrote:
| Thanks, that seems to match.
| 0xFF0123 wrote:
| A lot of theorising above, but this seems like the earliest
| reference
| thayne wrote:
| > Aurora uses it's own SAN replication layer
|
| I don't think that is used for cross region replication
| cfycom wrote:
| From the readme of the repo: "Use cases for this include
| running a Multi-Region high availability database cluster"
| prdpx7 wrote:
| i think AWS is using pgactive to sell "Aurora Postgres Global"
| https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...
| ltbarcly3 wrote:
| Don't use this unless you know exactly what you are doing.
|
| This is not a way to get better performance or scalability in
| general.
| dangoodmanUT wrote:
| It seems async? That's a major problem for transaction isolation
| dboreham wrote:
| Pick your poison.
| nico wrote:
| Tangential, but related. Is there a way to have a "locally
| writable" read replica, ie. a secondary db that reads from a
| primary, but that can also hold local changes that doesn't send
| back to the primary?
|
| One of the use cases is to have a development db that can get
| data from production or staging (and doesn't send local changes
| back)
|
| What I've done usually is have some script/cron/worker run
| periodically to get data, either via dump or running some
| queries, create a snapshot, store it in S3, then have a script on
| the local dev code that gets the snapshot and inserts/restores
| the data in the local db. This works for many cases, but index
| building can be a pain (take a long time), depending on the data
| AYBABTME wrote:
| Just FYI that most people would recommend against doing this
| for legal reasons. PII information and the likes are not
| usually allowed to land in a staging or dev environment, for
| various reasons. Doing this or allowing it, is a huge
| liability.
| nico wrote:
| Agreed, and there's a few ways to deal with that, like not
| including certain tables or excluding the data via queries
| when creating the snapshot
|
| Having said that, legal exposure and risk will highly depend
| on what you are working on. Probably for most projects this
| isn't a big deal. IANAL, this is not legal advice
| xinu2020 wrote:
| Curious about this - How would local writes conflicting with
| remote updates be handled? I can't think of a merge strategy
| working on all scenario (or even most of the time)
| nico wrote:
| Great question, I don't know. However, at least in my case, I
| wouldn't mind the source data always overwriting the local
| data. In fact, that's the way it works now when loading a
| newer snapshot, the local db is dropped and then re-built
| from the snapshot
|
| Thinking about the developer experience though, when loading
| a snapshot manually, the dev knows they are overwriting their
| local db. However, if replication happened
| automatically/continuously on the background, it could lead
| to some really confusing/annoying behaviors
| mdavidn wrote:
| Load the snapshot to a "pristine" local database that you never
| modify. Whenever you need a "reset" of your dev database, drop
| it, then copy the pristine database using `createdb
| --template`. This copies prebuilt indexes rather than rebuild
| them, which is much faster.
| nico wrote:
| But when loading that pristine local db from the original
| source, it would still create the indices and thus take a
| long time?
|
| The goal is not necessarily having an easy way to reset, but
| rather an easy/quick way to load real data
| ForHackernews wrote:
| AFAIK that is the standard behavior with a Postgres logical
| replication setup. There is nothing preventing you doing writes
| on the replica, they just won't get sent back anywhere else.
| eatonphil wrote:
| A bit of the history as I've been told by 2nd Quadrant/EDB people
| (my teammates):
|
| BDR1 [0] came first and was, and is, open source. pgactive is
| based on BDR1. BDR2 was a closed-source rewrite of BDR1 that was
| later abandoned.
|
| pglogical v1 and v2 (PGL1, PGL2) were, and are, open-source [1].
|
| pglogical v1, after heavy modification, was eventually merged
| into Postgres 10.
|
| Based on learnings from this logical replication in Postgres 10,
| 2nd Quadrant started pglogical v2.
|
| pgEdge is based on pglogical v2.
|
| Then later 2nd Quadrant started pglogical v3 (closed source) and
| BDR v3 (closed source). They were merged into just BDR v4. At
| some point the BDR product was renamed to Postgres Distributed
| (PGD) [2].
|
| 2ndQuadrant was acquired by EDB. We (EDB) just released PGD v6.
|
| [0] https://github.com/2ndQuadrant/bdr/tree/bdr-
| plugin/REL1_0_ST...
|
| [1] https://github.com/2ndquadrant/pglogical
|
| [2] https://www.enterprisedb.com/docs/pgd/latest/
| cbsmith wrote:
| PGDv6 is still closed source, yeah?
| eatonphil wrote:
| That's right.
| x4m wrote:
| And there's also
| https://postgrespro.com/docs/enterprise/current/multimaster
| With a history behind it.
| eatonphil wrote:
| Right, I'm not so familiar with it but from that page:
|
| > The replication mechanism is based on logical decoding and
| an earlier version of the pglogical extension provided for
| community by the 2ndQuadrant team.
| dang wrote:
| Related. Others?
|
| _Pgactive: Active-Active Replication Extension for PostgreSQL on
| Amazon RDS_ - https://news.ycombinator.com/item?id=37838223 - Oct
| 2023 (1 comment)
| iotapi322 wrote:
| After setting up numerous clusters with repmgr and patroni along
| with running them in zero down time production... This is the
| very last plugin i would ever install. I like to sleep at night.
| ahachete wrote:
| I'm not tired of reminding everyone that "conflict resolution" is
| no more than an euphemism for "breaking durability by dropping
| already committed and acknowledged data".
|
| Either architect for no data overlap on writes across all the
| "actives" (in which case software like pgactive could be a good
| deal) or use a purely distributed database (like Yugabyte).
___________________________________________________________________
(page generated 2025-07-16 23:00 UTC)