[HN Gopher] PostgreSQL Streaming Replication (WAL); What It Is a...
       ___________________________________________________________________
        
       PostgreSQL Streaming Replication (WAL); What It Is and How to
       Configure One
        
       Author : thunderbong
       Score  : 147 points
       Date   : 2024-10-12 11:52 UTC (11 hours ago)
        
 (HTM) web link (mindhub365.com)
 (TXT) w3m dump (mindhub365.com)
        
       | pqdbr wrote:
       | It's a great article, but I've always felt these are missing
       | critical real-world application from the perspective of a full
       | stack dev who also wants to manage their own databases.
       | 
       | - How do I check how many seconds the replica is lagging behind
       | master?
       | 
       | - How would I monitor the replica? A simple cron task that pings
       | a health check if everything is OK (lag is < x) would be a nice
       | start.
       | 
       | And then things get complicated quickly:
       | 
       | - How do I failover to the replica if the primary goes down?
       | There's pgBouncer, repmgr, patroni...
       | 
       | - Should I have it automatically or manually?
       | 
       | - Do I need 2 replicas to avoid the split brain scenario? my
       | brain hurts already.
       | 
       | - After a failover occurs (either automatically or manually), how
       | in the world am I going to configure the primary to be the
       | primary again, and the replica to act as the replica again, going
       | back to the original scenario?
       | 
       | I'd pay to learn this with confidence.
        
         | forinti wrote:
         | Lag is one little detail that I find lacking in Streaming
         | Replication. If there was no transaction in the last n seconds,
         | it's going to tell you that lag is n s.
         | 
         | Oracle's Dataguard will tell there's no lag, because the
         | databases are the same.
         | 
         | In a development database, lag can get quite high, but it makes
         | no sense to set off an alarm if lag gets too high because it
         | might just be that there was nothing going on.
         | 
         | But the simplicity and reliability of Postgresql I much prefer
         | over Oracle, any day of the week.
        
           | WJW wrote:
           | If it's really a problem, you can always use the pt-heartbeat
           | tool from the percona toolkit:
           | https://docs.percona.com/percona-toolkit/pt-heartbeat.html
           | 
           | Conceptually it is very straightforward: It just updates the
           | only row in a special table to the current time every
           | --interval seconds on the main database, then monitors the
           | value in any followers to see which value they have. Subtract
           | the two timestamps to get replication lag.
        
             | forinti wrote:
             | I'll check it out. Thanks.
        
         | meow_catrix wrote:
         | The modern way is to sidestep the issue altogether and use
         | Kubernetes with a database designed to run on Kubernetes. You
         | can get sharding, replication and leader election essentially
         | for free - you can concentrate on using the database instead of
         | running the database.
         | 
         | Compute is really cheap compared to engineering man-hours.
        
           | cpucycling7 wrote:
           | > The modern way is to sidestep the issue altogether and use
           | Kubernetes
           | 
           | Kubernetes does require quite some time to learn/master. So
           | you could say one replaces one time-consuming issue with
           | another.
        
           | remram wrote:
           | What's "a database designed to run on Kubernetes"? Cassandra?
        
             | fastest963 wrote:
             | Something like Yugabyte or Cockroach
        
             | linuxdude314 wrote:
             | Pretty sure they were referring to the operators you can
             | install that will stand up production ready replicated
             | clusters.
             | 
             | They work quite well.
        
               | remram wrote:
               | They replied to a comment mentioning "pgBouncer, repmgr,
               | patroni" so supposedly not one of those.
        
           | dionian wrote:
           | it was a minor pain finding and setting up a postgres
           | operator in k8s, but once i got it going it wasn't too
           | horrible. are these other solutions that are more built for
           | it significantly easier to manage?
        
             | slig wrote:
             | Which one did you end up choosing?
        
           | kachapopopow wrote:
           | https://artifacthub.io/packages/helm/bitnami/postgresql,
           | postgres is fortunately one of them. Downside is that you
           | cannot scale writes.
        
           | yrro wrote:
           | Does the Cloud Native PG operator count?
        
             | cheboygan wrote:
             | +1 on checking out cloud native PostgreSQL operator, or
             | other PG operators like crunchy or zalando or ongres
        
         | napsterbr wrote:
         | > Do I need 2 replicas to avoid the split brain scenario? my
         | brain hurts already.
         | 
         | It will hurt even more.
         | 
         | The recommended way is to set up a witness server. Yet another
         | thing to manage in a properly designed Postgres cluster.
         | Certainly not an easy/trivial thing to do, ops-wise.
         | 
         | From [0]:
         | 
         | > By creating a witness server in the same location (data
         | centre) as the primary, if the primary becomes unavailable it's
         | possible for the standby to decide whether it can promote
         | itself without risking a "split brain" scenario: if it can't
         | see either the witness or the primary server, it's likely
         | there's a network-level interruption and it should not promote
         | itself. If it can see the witness but not the primary, this
         | proves there is no network interruption and the primary itself
         | is unavailable, and it can therefore promote itself (and
         | ideally take action to fence the former primary).
         | 
         | An interesting acronym you'll hear is STONITH (in order to
         | fence the former primary).
         | 
         | [0] - https://www.repmgr.org/docs/current/repmgrd-witness-
         | server.h...
        
           | nolist_policy wrote:
           | You want STONITH anyways. It's all necessary complexity in a
           | HA cluster.
        
         | candiddevmike wrote:
         | > I'd pay to learn this with confidence.
         | 
         | Great, there's a whole industry of PostgreSQL
         | consultants/platforms ready to take your money. You could put
         | on your tinfoil hat and say say PostgreSQL's lack of out of the
         | box, integrated, easy to use HA is by design to make money.
         | 
         | I think most folks use Patroni with some kind of service
         | discovery solution like Kubernetes or Consul and have it
         | abstract all of this for them.
        
           | lbriner wrote:
           | You could also say that since the maintainers are doing this
           | for free, it is OK that some of them work for businesses that
           | provide (optional) technical support.
           | 
           | However you mileage might vary. We found someone listed as a
           | postgres consultant to help us but aside from a load of
           | (alleged) personal problems that made the job take ages, I
           | realised that the optimisation he sent me was basically a
           | rehashed version of something I found on the internet, which
           | he was obviously misrepresenting as his work. He got really
           | arsey with me when I told him that I was really disappointed
           | with his service!
        
         | egnehots wrote:
         | > How do I check how many seconds the replica is lagging behind
         | the master?
         | 
         | Use PostgreSQL administrative functions, specifically:
         | pg_last_xact_replay_timestamp.
         | (https://www.postgresql.org/docs/current/functions-
         | admin.html...)
         | 
         | > How would I monitor the replica? A simple cron task that
         | pings a health check if everything is OK (lag is < x) would be
         | a good start.
         | 
         | There are many solutions, highly dependent on your context and
         | the scale of your business. Options range from simple cron jobs
         | with email alerts to more sophisticated setups like ELK/EFK, or
         | managed services such as Datadog.
         | 
         | > How do I failover to the replica if the primary goes down?
         | 
         | > Should I handle failover automatically or manually?
         | 
         | > Do I need two replicas to avoid a split-brain scenario? My
         | head hurts already.
         | 
         | While it may be tempting to automate failover with a tool, I
         | strongly recommend manual failover if your business can
         | tolerate some downtime.
         | 
         | This approach allows you to understand why the primary went
         | down, preventing the same issue from affecting the replica.
         | It's often not trivial to restore the primary or convert it to
         | a replica. YOU become the concensus algorithm, the observer,
         | deciding which instance become the primary.
         | 
         | Two scenarios to avoid:
         | 
         | * Falling back to a replica only for it to fail (e.g., due to a
         | full disk).
         | 
         | * Successfully switching over so transparently that you will
         | not notice that you're now running without a replica.
         | 
         | > After a failover (whether automatic or manual), how do I
         | reconfigure the primary to be the primary again, and the
         | replica to be the replica?
         | 
         | It's easier to switch roles and configure the former primary as
         | the new replica. It will then automatically synchronize with
         | the current primary.
         | 
         | You might also want to use the replica for:
         | 
         | * Some read-only queries. However, for long-running queries,
         | you will need to configure the replication delay to avoid
         | timeouts.
         | 
         | * Backups or point-in-time recovery.
         | 
         | If you manage yourself a database, I strongly recommand to gain
         | confidence first in your backups and your ability to restore
         | them quickly. Then you can play with replication, they are tons
         | of little settings to configure (async for perf, large enough
         | wall size to restore quickly, ...).
         | 
         | It's not that hard, but you want to have the confidence and the
         | procedure written down before you have to do it in a production
         | incident.
        
         | andix wrote:
         | You're completely right. You need a full management solution
         | around postgres to make it work, and I wouldn't recommend
         | building it yourself.
         | 
         | One solution is Kubernetes and one of the many postgres
         | operators. Still not easy as pie.
        
         | klaussilveira wrote:
         | I am managing my own PostgreSQL cluster with Patroni and, so
         | far, the experience has been a breeze. The initial learning
         | curve is difficult, Patroni docs are not the best, and the
         | community support around it is minimal. Naive questions are
         | usually torpedoed by the maintainer or other experienced devs,
         | which does not foster a good environment for people learning on
         | their own.
         | 
         | Luckily, the Percona Distribution for Postgres includes amazing
         | documentation on how to setup and run Patroni, even if you
         | choose not to use their distribution. I would highly recommend
         | following their step by step:
         | https://docs.percona.com/postgresql/17/solutions/ha-setup-ap...
         | 
         | I have OpenTofu scripts for setting this cluster up, although
         | they might not be useful to you since I am using CloudStack
         | instead of other clouds.
        
         | himinlomax wrote:
         | > - How do I check how many seconds the replica is lagging
         | behind master?
         | 
         | > - How would I monitor the replica? A simple cron task that
         | pings a health check if everything is OK (lag is < x) would be
         | a nice start.
         | 
         | No, you should use Patroni. It configures and monitors
         | Postgres' native replication.
         | 
         | https://patroni.readthedocs.io/en/latest/
        
         | lbriner wrote:
         | > How do I check replica lagging? I use the prometheus exporter
         | for postgres
         | 
         | > How would I monitor the replica? Same. You can also use
         | something like HA proxy calling a postgres CLI command to
         | connect to the instance
         | 
         | > How do I failover? Mostly, you probably want to do this
         | manually because there can be data loss and you want to make
         | sure the risk is worth it. I simply use repmgr for this.
         | 
         | > Do I need 2 replicas? It's usually good to have at least 3 (1
         | master and 2 slaves) but mostly so that if one fails, you still
         | have 2 remaining i.e. time to get a 3rd back online
         | 
         | > How do I failback? Again, very easy with repmgr, you just
         | tell the primary to be the primary again. The failed over
         | primary gets stopped, the original primary gets fast-forwarded
         | and promoted to primary and everything else gets told to
         | follow.
         | 
         | I do agree that this space for postgres is very fragmented and
         | some tools appear abandoned but its pretty straight-forward
         | with just postgres + barman + repmgr, I have a series of vides
         | on YouTube if you are interested but I am not a Postgres expert
         | so please no hating :-) https://youtu.be/YM41mLZQxzE
        
           | cheald wrote:
           | +1 to all of this. The thing I'd add is that we use barman
           | for our additional replicas; WAL streaming is very easy to do
           | with Barman, and we stream to two backups (one onsite, one
           | offsite). The only real costs are bandwidth and disk space,
           | both of which are cheap. Compared to running a full replica
           | (with its RAM costs), it's a very economical way to have a
           | robust disaster recovery plan.
           | 
           | If you're doing manual failover, you don't need an odd number
           | of nodes in the cluster (since you aren't looking for quorum
           | to automatically resolve split-brain like you would be with
           | tools Elasticsearch or redis-sentinel), so for us it's just a
           | question of "how long does it take to get back online if we
           | lose the primary" (answer: as long as it takes to determine
           | that we need to do a switch and invoke repmgr switchover),
           | and "how robust are we against catastrophic failure" (answer:
           | we can recover our DB from a very-close-to-live barman backup
           | from the same DC, or from an offsite DC if the primary DC got
           | hit by an airplane or something).
        
       | kachapopopow wrote:
       | I see this as one of the reasons to use kubernetes (& helm).
       | 
       | https://artifacthub.io/packages/helm/bitnami/postgresql
       | 
       | Configures all of this for you with near zero additional
       | configuration required. There's also postgres-ha which handles
       | zero-downtime failover by spawning proxy that handles failures in
       | a specialized way versus just directly forwarding to a psql
       | server.
        
       | andix wrote:
       | The only real world easy to use solution for postgres replication
       | I've found, are the kubernetes operators. For example
       | CloudnativePG.
       | 
       | It's not just replication what you need. It's failover, recovery,
       | monitoring, self-healing, backups, and so on.
       | 
       | Are there any other free/open implementations outside of
       | kubernetes?
        
         | xtracto wrote:
         | I used the wal feature with a bash script that compressed the
         | wal into .xz files (-9 compression) every 10,000 lines and
         | stored those files in s3fs folder. This as a type of "online"
         | backup solution.
        
           | andix wrote:
           | Sure, but you probably want some automated snapshot recovery
           | on a node that has a quite recent replica. Restoring a
           | complete backup might take hours or days, and in a desaster
           | scenario you want to be back online within a few minutes.
           | Probably even without manual intervention.
           | 
           | And at the same time you want so spin up a new replica to
           | replace the lost one.
        
       | klaussilveira wrote:
       | For the K8S folks: https://stackgres.io
        
       ___________________________________________________________________
       (page generated 2024-10-12 23:01 UTC)