[HN Gopher] A PostgreSQL Docker container that automatically upg...
___________________________________________________________________
A PostgreSQL Docker container that automatically upgrades
PostgreSQL
Author : justinclift
Score : 149 points
Date : 2023-07-16 12:35 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mbreese wrote:
| This is a very confusing title and project at first glance. It
| took me a minute to figure out what you're doing...
|
| But just to be clear -- the point of this is to migrate an
| existing Postgres DB from the original version to a newer
| version? This is about the data volume itself, not automatically
| upgrading the Postgres binary when a new version is released?
|
| So, you'd basically pass a flag that says --migrate-db when
| starting the container to kick start changing the data on disk?
| So when you start a new Postgres 15 container, you could pass it
| a volume with Postgres 13 data and it would auto update the on
| disk DB data.
| justinclift wrote:
| It's pretty much answered here:
| https://news.ycombinator.com/item?id=36747498
|
| The upgrade-of-your-data happens automatically when the
| container starts, prior to starting the database server.
| mbreese wrote:
| Right, but you're not actually automatically migrating a
| Postgres 13 _program_ to Postgres 15 in a running container.
| That was what I first thought when I read the title.
|
| You're using the work "upgrade" differently than how I
| normally think about it when talking about version changes.
| You're talking about data-on-disk, not just the program
| version.
|
| (I realize that when talking about Postgres, the two are
| linked, but that's not the case for most programs)
| justinclift wrote:
| Is there better wording that's concise?
|
| I'm just going with what I thought of, but am happy to
| adjust things as makes sense. :)
| rcfox wrote:
| "A PostgreSQL Docker container that automatically
| upgrades your database"
|
| Still uses "upgrade", but highlights the fact that you're
| changing the artifact and not just the software.
| [deleted]
| Izkata wrote:
| Postgres docs do use the word upgrade, for example:
|
| > pg_upgrade (formerly called pg_migrator) allows data stored
| in PostgreSQL data files to be upgraded to a later PostgreSQL
| major version without the data dump/restore typically required
| for major version upgrades
|
| https://www.postgresql.org/docs/current/pgupgrade.html
| VWWHFSfQ wrote:
| Just to be clear, this isn't for actually running postgres to
| serve your database, right? This is just more or less a tool to
| upgrade your data files to a new postgres version?
| justinclift wrote:
| It does both. As part of the startup process it first checks if
| the data files are for an old version, and upgrades them first
| (if needed).
|
| Then it starts PostgreSQL 15.3 as per normal.
|
| It's intended as being a drop-in replacement for existing
| (alpine based) PostgreSQL containers.
|
| That being said, it's still pretty new so don't use it on
| production data until you've tested it first (etc).
|
| --
|
| Btw, it uses the "--link" option when it runs pg_upgrade so
| should be reasonably suitable even for databases of a fairly
| large size. That option means it processes the database files
| "in place" to avoid needing to make a 2nd copy.
|
| At the moment it only process a single data directory, so if
| you have multiple then it's not yet suitable.
| minus10eng wrote:
| [dead]
| WirelessGigabit wrote:
| I pin Postgres to vMajor.
|
| But I was bitten in the rear by Wallabag the other day. Didn't
| work anymore. Ended up having to do a doctrine upgrade. Then it
| all worked. Same concept I guess. No automatic upgrade does allow
| you to pause and evaluate what you're doing, and take backups.
|
| Now I say it allows you. Whether you (or I) do so... after all,
| that is what automatic backups are for...
|
| Now when did I last validate those?
| user6723 wrote:
| Mutable containers, what could go wrong?
| ndsipa_pomu wrote:
| This looks useful.
|
| I've had issues in the past with PostgreSQL as a backend for
| NextCloud (all running in docker) and blindly pulling the latest
| PostgreSQL and then wondering why it didn't work when the major
| version jumped. (It's easy enough to fix once you figure out why
| it's not running - just do a manual export from the previous
| version and import the data into the newer version).
|
| However, does this container automatically backup the data before
| upgrading in case you discover that the newer version isn't
| compatible with whatever is using it?
| justinclift wrote:
| > However, does this container automatically backup the data
| before upgrading ...
|
| Nope. This container is the official Docker Postgres
| 15.3-alpine3.18 image + the older versions of PostgreSQL
| compiled into it and some pg_upgrade scripting added to the
| docker entrypoint script to run the upgrade before starting
| PostgreSQL.
|
| It goes out of it's way to use the "--link" option when running
| pg_upgrade, to upgrade in-place and therefore avoid making an
| additional copy of the data.
|
| That being said, this is a pretty new project (about a week old
| on GitHub), and having some support for making an (optional)
| backup isn't a bad idea.
|
| I'll have to think on a good way to make that work. Probably
| needs to check for some environment variable as a toggle or
| something, for the people who want it... (unsure yet).
| tough wrote:
| Seems def interesting to handle backup, wonder if could just
| be something simple enough that lets you downgrade if upgrade
| fails
| justinclift wrote:
| Yeah. Probably going to need to add a bunch more error
| catching and handling-of-specific-situations. Edge cases
| being a thing and all. :)
| westurner wrote:
| Chapter 26. Backup and Restore:
| https://www.postgresql.org/docs/current/backup.html
|
| Chapter 26. Backup and Restore > 26.3. Continuous
| Archiving and Point-in-Time Recovery (PITR) > 26.3.4.
| Recovering Using a Continuous Archive Backup:
| https://www.postgresql.org/docs/current/continuous-
| archiving...
|
| IIRC there are fancier ways than pg_dump to do Postgres
| backups that aren't postgres native PITR?
|
| gh topic postgresql-backup:
| https://github.com/topics/postgresql-backup
|
| - pgsql-backup.sh: https://github.com/fukawi2/pgsql-
| backup/blob/develop/src/pgs...
|
| - https://github.com/SadeghHayeri/pgkit#backup https://gi
| thub.com/SadeghHayeri/pgkit/blob/main/pgkit/cli/co... :
| $ sudo pgkit pitr backup <name> <delay>
|
| > Recover: _This command is used to recover a delayed
| replica to a specified point in time between now and the
| database 's delay amount. The time can be given in the
| YYYY-mm-ddTHH:MM format. The latest keyword can also be
| used to recover the database up to the latest transaction
| available._: $ sudo pgkit pitr recover
| <name> <time> $ sudo pgkit pitr recover <name>
| latest
|
| > _The database will then start replaying the WAL files.
| It 's progress can be tracked through the log files at
| /var/log/postgresql/._
|
| - "PostgreSQL-Disaster-Recovery-With-Barman"
| https://github.com/softwarebrahma/PostgreSQL-Disaster-
| Recove... :
|
| > _The solution architecture chosen here is a
| 'Traditional backup with WAL streaming' architecture
| implementation (Backup via rsync/SSH + WAL streaming).
| This is chosen as it provides incremental backup/restore
| & a bunch of other features._
|
| Glossary of backup terms:
| https://en.wikipedia.org/wiki/Glossary_of_backup_terms
|
| Continuous Data Protection > Continuous vs near
| continuous: https://en.wikipedia.org/wiki/Continuous_Data
| _Protection#Con...
| justinclift wrote:
| Thanks. I hadn't come across pgkit before. :)
| ndsipa_pomu wrote:
| A default option is to warn about the possibility and then
| just leave the admins to do their own backups (which they
| should be doing anyway).
| justinclift wrote:
| What do you reckon the right place(s) to warn people would
| be?
| ndsipa_pomu wrote:
| Probably in the project readme. You could put a warning
| in the logs as well (e.g. on container startup), but that
| wouldn't be as visible.
| justinclift wrote:
| Thanks. I'll make that update in the morning too. :)
| westurner wrote:
| As an interactive prompt before mutating the data, with a
| `-y` to bypass the interactive check (and in the docs for
| `-y`)
| justinclift wrote:
| Nah. I can't see anything that needs interaction as being
| workable, as this is supposed to be an "automatic
| upgrade" thing.
| jacooper wrote:
| I think following MariaDBs lead is a good idea, thru have an
| environment variable to enable automatic upgrades, and it
| creates a backup before every upgrade. Works like a charm.
| justinclift wrote:
| Any idea how well it works with multi-TB databases?
|
| Copying TBs of data around seems like it would delay the
| start of the database by a lot.
| develatio wrote:
| Related: https://github.com/tianon/docker-postgres-upgrade
| justinclift wrote:
| This one seems like a good proof of concept, but appears to
| need running as a separate task for upgrading rather than being
| an "automatic upgrade before starting the database server"
| thing?
| mkarrmann wrote:
| Neat :)
|
| I personally feel like upgrading a database should be an explicit
| admin process, and isn't something I want my db container
| entrypoint automagically handling.
| rc_mob wrote:
| yes. i do have a few developer environments this would be great
| for though.
| justinclift wrote:
| Yeah, a lot of the time I'd agree with you. :)
|
| This container came about for the Redash project
| (https://github.com/getredash/redash), which had been stuck on
| PostgreSQL 9.5 (!) for years.
|
| Moving to a newer PostgreSQL version is easy enough for new
| installations, but deploying that kind of change to an existing
| userbase isn't so pretty.
|
| For people familiar with the command line, PostgreSQL, and
| Docker then its no big deal.
|
| But a _large_ number of Redash deployments seem to have been
| done by people not skilled in those things. "We deployed it
| from the Digital Ocean droplet / AWS image / (etc)".
|
| For those situations, something that takes care of the database
| upgrade process automatically is the better approach. :)
| [deleted]
| cosmotic wrote:
| I disagree. It is (or was, last time I tried a year or two
| ago) hard to upgrade postgres because you need both the old
| and new binary. Package manager tries its best to allow just
| one copy. You end up copying from one image to another, then
| run into tiny breaking differences.
|
| I agree it should be explicitly invoked and not automated,
| for something almost everyone needs to do, it sure is a hard
| task.
| geenat wrote:
| For the uninitiated: Fairly certain this can only handle minor
| version upgrades... Which is generally as easy as moving your
| data folder beside the new binary and starting postgres.. (minor
| version upgrades are already trivial..)
|
| Major versions still require a pg_dump (or a scary migration
| using postgres' anemic logical replication) unless some
| advancement has happened on the postgres side I'm unaware of.
| justinclift wrote:
| > Beware: Fairly certain this can only handle minor version
| upgrades ...
|
| Nope. It's entirely for upgrading between major PostgreSQL
| versions. :)
|
| It uses the PostgreSQL "pg_upgrade" utility to do the data
| upgrade behind the scenes:
|
| https://www.postgresql.org/docs/current/pgupgrade.html
| Alifatisk wrote:
| I have always wondered how a database and its persistence is
| handled in a containerized environment.
|
| This covers one of the issues.
| johannes1234321 wrote:
| The way you handle persistence is by using storage volumes
| mounted form the outside. Don't put the data in the container,
| only the software, which then can be replaced. Then managing is
| similar to other environments. For updating you don't replace
| rpm/deb packages and restart, but replace the container using
| the same volume, which should trigger the dbms-specific update
| routine. Similarly you do backups similar to outside container
| world (depending on DBMS and your exact choice, from filesystem
| snapshot to logical backup (dump) or something more or less
| smart in between)
| justinclift wrote:
| With PostgreSQL containers used in docker-compose, the common
| approach is to use a bind mount so the database files are
| persisted on the host.
|
| I've not done stuff with kubernetes yet though, so I have no
| idea how it's done there.
| bheadmaster wrote:
| Both Docker and Kubernetes can use volumes to provide
| persistant storage to containers/pods without bind mount.
| justinclift wrote:
| Cool, thanks all. :)
| johannes1234321 wrote:
| > I've not done stuff with kubernetes yet though, so I have
| no idea how it's done there.
|
| Essentially the same, except that K8s gives you a wide
| variety of storage backend integrations (Storage classes +
| storage providers) which can attach "anything" (local volumes
| on the node, NFS, NAS, Cloud Volumes, ...) depending on your
| local environment and needs.
| tempest_ wrote:
| Are people putting their databases in K8s now? I thought
| the old rule of thumb was don't do that but perhaps it has
| changed.
|
| Feels like the database would take a massive performance
| hit using network backed storage unless the software is
| aware of that fact.
| mschuster91 wrote:
| If you're running in the cloud, say on AWS EKS, it makes
| sense to use in-cluster databases for development
| environments, and only use RDS DB's for
| production/integration to save on hosting costs.
| williamdclt wrote:
| You take the risk of not catching RDS (or network) quirks
| in staging, but that's a trade-off
| johannes1234321 wrote:
| There is a huge push for doing that. Whether it is
| abstract the right thing can be questioned, but many IT
| departments decided to standardize around Kubernetes for
| all datacenter management and push that way and in some
| environments (5G networking) it's part of the specified
| stack.
| hotpotamus wrote:
| Saving it on one host's local filesystem doesn't feel
| particularly production-ready. There is a distributed store
| system for Kubernetes called "Longhorn" that I've heard good
| things about, but I haven't really looked into it much
| myself. I just run a pair of VMs with a manual
| primary/replica setup and have never needed to fail over to
| the replica yet, but I can imagine some sort of fully
| orchestrated container solution in the future.
| justinclift wrote:
| Heh Heh Heh
|
| I'm just pointing out how it's commonly done. Of course
| people add things like replication, distributed
| filesystems, (etc) to the mix to suit their needs. :)
| hotpotamus wrote:
| Yep, it seems like the most common answer is "pay
| exorbitant prices to your cloud provider for a managed
| SQL database", but we've managed to save a chunk of money
| running it on our own. I've always said that between
| three engineers(me being one of them), we can form one
| competent DBA, but our needs are also pretty modest.
| Kab1r wrote:
| I've had bad luck with longhorn, but I have heard good
| things about using Rook with Ceph for PVCs
| wokkel wrote:
| So we come full circle it seems. Apt update/upgrade already does
| this and has been doing this for a few years now. So if i drop
| docker and just run a container with Debian I get this with a few
| years of testing behind it.
| neilv wrote:
| > _It 's whole purpose in life is to automatically detect the
| version of PostgreSQL used in the existing PostgreSQL data
| directory, and automatically upgrade it (if needed) to the latest
| version of PostgreSQL._
|
| In a small startup...
|
| * If the data is mission-critical and constantly changing,
| PostgreSQL is a rare infra thing for which I'd use a managed
| service like AWS RDS, rather just Debian Stable EC2 or my own
| containers. The first time I used RDS, the company couldn't
| afford to lose an hour of data (could destroy confidence in
| enterprise customer's pilot project), and without RDS, I didn't
| have time/resources to be nine-9s confident that we could do PITR
| if we ever needed to.
|
| * If the data is less-critical or permits easy sufficient
| backups, and I don't mind a 0-2 year-old stable version of PG,
| I'd probably just use whatever PG version Debian Stable has
| locked in. And just hold my breath during Debian security updates
| to that version.
|
| (Although I think I saw AWS has a cheaper entry level of pricing
| for RDS now, which I'll have to look into next time I have a
| concrete need. AWS pricing varies from no-brainers to lunacy,
| depending on specifics, and specifics can be tweaked with costs
| in mind.)
| [deleted]
| justsomehnguy wrote:
| mkdir /var/lib/postgresql/data/old mv -v
| /var/lib/postgresql/data/* /var/lib/postgresql/data/old/
| mkdir /var/lib/postgresql/data/new
|
| You should create the both dirs first, then _check if they do
| really exist_ and only then move the files. Shit happens and it
| 's better be safe than sorry.
|
| Also I would replace ../old and ../new with $OLD and $NEW to have
| a bit less clutter in the next block with the explicit upgrade
| calls, ie: /usr/local/bin/pg_upgrade --link -d
| $OLD -D $NEW -b /usr/local-pg9.5/bin -B /usr/local/bin
|
| Overall it's nice idea, just need some more safety checking
| before starting the process.
| justinclift wrote:
| Cool, good thinking, thanks heaps. I'll implement that in the
| morning. :)
| lmeyerov wrote:
| That's great.
|
| We have been maintaining a manual migration script for our docker
| users for the DB part, while the app part does migrations
| automatically already (Django), so making the db part more built-
| in makes a lot of sense.
|
| Our case doesn't really need live/fully auto, just an easy mode
| for admins during upgrade cycles, and our scripts were pretty
| generic, so a project like this makes a lot of sense. There are a
| few modes we support - local / same-server, cross-node, etc - so
| am curious.
___________________________________________________________________
(page generated 2023-07-16 23:00 UTC)