[HN Gopher] Minimal downtime major PostgreSQL version upgrades w...
       ___________________________________________________________________
        
       Minimal downtime major PostgreSQL version upgrades with
       pg_easy_replicate
        
       Author : shayonj
       Score  : 222 points
       Date   : 2023-06-20 15:46 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jensenbox wrote:
       | I know I am going to get fairly polarized responses on this
       | one...
       | 
       | Too bad it is written in Ruby.
        
         | remram wrote:
         | If you had included any content or justification in your
         | comment, it might have gone differently. "I don't like software
         | written in Ruby" is not interesting to anyone. No polarization.
        
         | revskill wrote:
         | What if it's written in Assembly (which is hard to read,
         | maintained....) ?
        
           | mike256 wrote:
           | I likecode written in Assembly. No problem with that. If it's
           | written in Assembly in the first place (not disassembled from
           | another language) it's usually not that hard to read...
        
         | mike_hock wrote:
         | Based.
        
       | jreed91 wrote:
       | Be careful if you have sequences as those are not replicated.
       | https://www.jacobreed.dev/blog/migrate-postgres-instances
        
         | shayonj wrote:
         | I will update the readme to mention this - but the lib takes
         | care of updating sequence with the data catchup:
         | https://github.com/shayonj/pg_easy_replicate/blob/13ad6af51f...
        
       | gfosco wrote:
       | Have done minimal downtime major version upgrades using standard
       | replication, switching between two "master" servers. In the five
       | minute range.
        
         | hinkley wrote:
         | This is also a spot where having failover strategies and
         | circuit breakers can help. Microservices make you think about
         | these things but you don't have to wait for that forcing
         | function to think about them.
         | 
         | When the database goes down, you have to do something else.
         | Could be the server crashed or could just be a Postgres
         | upgrade. Five minutes is just about the right amount of time
         | for an open circuit to do its job.
        
       | tudorg wrote:
       | Looks very interesting!
       | 
       | > The switch is made by putting the user on the source database
       | in READ ONLY mode, so that it is not accepting any more writes
       | and waits for the flush lag to be 0. It is up to user to kick of
       | a rolling restart of your application containers or failover DNS
       | (more on these below in strategies) after the switchover is
       | complete, so that your application isn't sending any read/write
       | requests to the old/source database.
       | 
       | This does mean that there will be a "downtime" during which the
       | DB is in read-only mode, right? I understand this period can be
       | really small, but that depends on how long the app deployment
       | takes.
        
         | shayonj wrote:
         | Thanks for taking a look!
         | 
         | Thats right, it was a typo. Mean't zero data loss and minimal
         | downtime. There are some strategies in the readme like using
         | weighted based DNS failover to reduce the downtime even less
         | without requiring application deployment.
        
       | MuffinFlavored wrote:
       | The title says "zero downtime" but the GitHub tagline says
       | "minimal downtime"
       | 
       | Zero... data loss maybe?
        
         | tycoon177 wrote:
         | Was just going to comment this. It's specifically:
         | 
         | > This ensures zero data loss and minimal downtime for the
         | application.
        
       | cmdrriker wrote:
       | interesting, how does this compare to something like
       | pg_auto_failover in a two or three-node PostgreSQL setup?
        
       | pqdbr wrote:
       | Let's say I have a Rails app and I want to do a major postgres
       | version upgrade using this tool.
       | 
       | I see one of the requirements is that 'Both databases should have
       | the same schema'. How should I manually load the schema to the
       | new database (running on port 5433 for instance)?
        
         | shayonj wrote:
         | That's a great question: internally what we have done is when
         | preparing the new empty database, we also run a "bundle exec
         | rake db:schema:load" against it. Depending on your setup, it
         | can be hard to do.
         | 
         | I am exploring some options like loading schema from a
         | schema.sql file or something along the lines of what pg_dump
         | does for schema export.
         | 
         | Would love to hear any ideas.
        
         | jrochkind1 wrote:
         | The easy answer would be on some non-production copy of the
         | Rails app, point at the new db in database.yml, and simply run
         | `rails db:create; rails db:schema:load`, or some variation
         | rails command like `rails db:prepare`.
         | 
         | Rather than this ad-hoc thing, you could also try to create a
         | new rails "environment" for this use, I guess.
        
         | jxcl wrote:
         | Yeah I think the documentation is a bit ambiguous here.
         | Postgres "schemas" have a very specific meaning, different from
         | what most people think of when they hear the word. It's more
         | like a table namespace.
         | 
         | https://www.postgresql.org/docs/current/ddl-schemas.html
        
           | hinkley wrote:
           | I have historically had a "drop-create script" that could
           | handle building me empty tables. Typically split up so that I
           | can create tables, import (test) data from a snapshot or
           | synthetic source, then enable constraints and indexes.
           | 
           | So even if a tool like that needed a little boost at the
           | beginning to get going, I'd have something that already took
           | care of that.
        
       | ekiauhce wrote:
       | Is there an alternative for MySQL? We want to upgrade from 5.7 to
       | 8 pretty soon, so it would be nice to have the appropriate
       | toolkit.
        
       | hans_castorp wrote:
       | There is also a new (OpenSource) Postgres extension that claims
       | to provide multi-master replication based on logical replication,
       | BDR and pglogical
       | 
       | https://pgedge.github.io/spock/
       | 
       | which can also be used for online upgrades.
       | 
       | It however only supports Postgres 15 and (not yet released) 16
        
       | srslack wrote:
       | I did this the "hard way" a year ago. This blog post came in
       | handy:
       | 
       | https://jstaf.github.io/posts/pglogical/
       | 
       | pglogical is magic. Keeping an eye on this project for the
       | future, looks great.
        
         | sgt wrote:
         | > Binary replication only works with DBs of the same major
         | version. If you want to replicate to a different version,
         | well... you can't.
         | 
         | Is this true? I thought they fixed that at some point after
         | PG11.
        
         | shayonj wrote:
         | Love pglogical. I was thinking, it'd be nice to orchestrate
         | this setup using pglogical too. Would attempt it if there is
         | interest. Especially with bi-directional replication setup.
        
       | jrochkind1 wrote:
       | I guess this is pretty similar to what heroku does for pg
       | updates?
       | 
       | If using this tool for upgrades with minimal downtime, if I
       | understand right, you need to orchestrate your db callers to
       | switch over to the new primary at the right point, when it's
       | ready? Tips for getting that to happen at just the right point to
       | minimize downtime?
        
         | shayonj wrote:
         | Great question: so you can watch the stats command. The command
         | returns a JSON every second which has a switchover_completed_at
         | field. It is updated the as soon as the switchover is complete.
         | 
         | UPDATE: Some more reading material here:
         | https://github.com/shayonj/pg_easy_replicate#switchover-stra...
        
       | remram wrote:
       | The fastest way to switchover is probably adding a iptables rule
       | on the old server to DNAT to the new server. Unless you have a
       | load-balancer or another network middleware like Kubernetes'
       | clusterIP services (which uses iptables or IPVS internally).
       | 
       | Waiting for a rolling restart or DNS cache expiration could take
       | a while, during which the app is talking to a read-only database,
       | if I understand correctly.
        
         | aidos wrote:
         | I could definitely have done with that idea a couple of weeks
         | back!
        
         | shayonj wrote:
         | Cool idea!
        
         | bbarnett wrote:
         | This is why you always give two ip addresses to such servers.
         | One for the server, and one for the service.
        
       | take-five wrote:
       | Nice UX!
       | 
       | I checked the code and noticed some things that you might want to
       | address in the future (I've done major version upgrades via
       | logical replication myself several times using a script similar
       | to this).
       | 
       | The "default" way of bringing the target DB in sync with the
       | source one (CREATE PUBLICATION on the source then CREATE
       | SUBSCRIPTION on the target) only works for moderately sized
       | databases. Even on ~50GB I noticed that it may take hours to
       | catch up with the origin. There are a couple of solutions:
       | 
       | 1. Drop all indices except for the primary keys before performing
       | initial copy, and then restore the indices after all tables have
       | been copied and replication stream started.
       | 
       | 2. Instead of copying into a blank database, you could instead
       | create a logical dump using pg_dump with snapshot isolation,
       | restore it on the target DB using pg_restore, create a
       | subscription in the disabled state, advance the logical
       | replication slot to the LSN with which you created the logical
       | dump, and then enable the subscriptions. This way your target
       | database would only have to process the delta between the time
       | you created a dump and the current time. I have a snippet
       | illustrating how we did it: https://gist.github.com/take-
       | five/1dab3a99c8636a93fc69f36ff9....
       | 
       | 3. We found out that pg_dump + pg_restore is still very slow on
       | larger databases (~1TB) and it's untenable to wait several hours.
       | Remember that while you're waiting until the dump is being
       | restored, the source database is accumulating WALs which it
       | should later send to the target DB. If left unchecked for 12-18
       | hours, it could lead to running out of disk space. This was
       | unacceptable for us, so instead of creating a target DB from a
       | logical dump, we created a copy of the source database from AWS
       | RDS snapshot, upgraded it using pg_upgrade and then set up
       | logical replication (similar to how it's described in Percona
       | blog: https://www.percona.com/blog/postgresql-logical-
       | replication-...). Something like this can probably be achieved
       | with ZFS snapshots.
       | 
       | --
       | 
       | Otherwise, it was very similar to how you programmed your
       | library.
       | 
       | Recently, I saw an article here on HN on the same topic of
       | migrating Postgres databases using logical replication. There was
       | a very nice idea to set up _reverse_ logical replication after
       | switchover so that it's possible to roll back quickly in case
       | something breaks.
       | 
       | EDIT: formatting
        
         | shayonj wrote:
         | These are great shouts! Thank you so much for sharing. One of
         | the operations I ran with this was ~500GB and yeah it takes a
         | few hours to catchup. For us, it wasn't an issue, especially
         | also since we were on AWS Aurora. However, that said, I can
         | totally see it being not feasible on other database
         | engine/systems. I considered the idea of dropping all the
         | indices and re-creating it once the catch up is complete. I'd
         | like pg_easy_replicate it to support that. It should make the
         | initial COPY lot more swift. On a 200GB large DB it cut down
         | down the total time by 90%.
         | 
         | Re: bi-directional replication, you read my mind :).
         | https://github.com/shayonj/pg_easy_replicate/blob/8df94aa93f...
         | 
         | These two features are top of my list to support in
         | pg_easy_replicate
        
       | pelagicAustral wrote:
       | Much appreciated. I was actually looking into a few options for
       | this exact same thing. I will give this shot right away.
        
         | shayonj wrote:
         | Thanks for giving it a try. Please don't hesitate to open
         | bugs/feature requests.
        
       ___________________________________________________________________
       (page generated 2023-06-20 23:00 UTC)