[HN Gopher] Speeding up PostgreSQL dump/restore snapshots
___________________________________________________________________
Speeding up PostgreSQL dump/restore snapshots
Author : tudorg
Score : 69 points
Date : 2025-07-05 16:42 UTC (6 hours ago)
(HTM) web link (xata.io)
(TXT) w3m dump (xata.io)
| hadlock wrote:
| One thing that's sorely needed in the official documentation is a
| "best practice" for backup/restore from "cold and dark" where you
| lose your main db in a fire and are now restoring from offsite
| backups for business continuity. Particularly in the 100-2TB
| range where probably most businesses lie, and backup/restore can
| take anywhere from 6 to 72 hours, often in less than ideal
| conditions. Like many things with SQL there's many ways to do it,
| but an official roadmap for order of operations would be very
| useful for backup/restore of roles/permissions, schema etc. You
| will figure it out eventually, but in my experience the dev and
| prod db size delta is so large many things that "just work" in
| the sub-1gb scale really trip you up over 200-500gb. Finding out
| you did one step out of order (manually, or badly written script)
| halfway through the restore process can mean hours and hours of
| rework. Heaven help you if you didn't start a screen session on
| your EC2 instance when you logged in.
| forinti wrote:
| If you can have a secondary database (at another site or on the
| cloud) being updated with streaming replication, you can switch
| over very quickly and with little fuss.
| SoftTalker wrote:
| Which is what you must do if minimizing downtime is critical.
|
| And, of course, your disaster recovery plan is incomplete
| until you've tested it (at scale). You don't want to be
| looking up Postgres documentation when you need to restore
| from a cold backup, you want to be following the checklist
| you have in your recovery plan and already verified.
| nijave wrote:
| Ideally off-site replica you fail over too and don't need to
| restore.
|
| pg_restore will handle roles, indexes, etc assuming you didn't
| switch the flags around to disable them
|
| If you're on EC2, hopefully you're using disk snapshots and WAL
| archiving.
| pgwhalen wrote:
| Of course that's preferable, but OP is specifically asking
| about the cold restore case, which tends to pose different
| problems, and is just as important to maintain and test.
| Arbortheus wrote:
| Offsite replica is only applicable if the cause is a failure
| of the primary. What if I'm restoring a backup because
| someone accidentally dropped the wrong table?
| WJW wrote:
| > in the 100-2TB range where probably most businesses lie
|
| Assuming you mean that range to start at 100GB, I've worked
| with databases that size multiple times but as a freelancer
| it's definitely not been "most" businesses in that range.
| moribunda wrote:
| While these optimizations are solid improvements, I was hoping to
| see more advanced techniques beyond the standard bulk insert and
| deferred constraint patterns. These are well-established
| PostgreSQL best practices - would love to see how pgstream
| handles more complex scenarios like parallel workers with
| partition-aware loading, or custom compression strategies for
| specific data types.
| bitbasher wrote:
| pg_bulkload[1] has saved me so much time cold restoring large (1+
| TB) databases. It went from 24-72 hours to an hour or two.
|
| I also recommend pg_repack[2] to squash tables on a live system
| and reclaim disk space. It has saved me so much space.
|
| 1: https://ossc-db.github.io/pg_bulkload/pg_bulkload.html
|
| 2: https://github.com/reorg/pg_repack
| itsthecourier wrote:
| I'm just checking it now
|
| do you export the data with this and then import it in the
| other db with it?
|
| or do you work with existing postgres backups?
| jpalawaga wrote:
| Postgres backups are tricky for sure. Even if you have a DR plan
| you should assume your incremental backups are no good and you
| need to restore the whole thing from scratch. That's your real DR
| SLA.
|
| If things go truly south, just hope you have a read replica you
| can use as your new master. Most SLAs are not written with 72h+
| of downtime. Have you tried the nuclear recovery plan, from
| scratch? Does it work?
___________________________________________________________________
(page generated 2025-07-05 23:00 UTC)