[HN Gopher] Updating a 50 terabyte PostgreSQL database (2018)
___________________________________________________________________
Updating a 50 terabyte PostgreSQL database (2018)
Author : Maksadbek
Score : 99 points
Date : 2022-01-13 17:06 UTC (5 hours ago)
(HTM) web link (www.adyen.com)
(TXT) w3m dump (www.adyen.com)
| jfbaro wrote:
| Thanks for sharing this. Really interesting. But a basic
| question, why not upgrading to PG 13 instead? I am curious about
| the reasons for staying on an older version of PostgreSQL.
| jfbaro wrote:
| I see it now that it's a 2018 article. Makes sense!
| hawk_ wrote:
| Interesting. Does PG13 let then do this without downtime?
| CubsFan1060 wrote:
| Not from 9.4, but now there is a path using logical
| replication between versions:
| https://www.percona.com/blog/2019/04/04/replication-
| between-...
| simonw wrote:
| I didn't understand how they avoid downtime during the upgrade -
| they start with "Stop traffic to the database cluster" and then
| perform a bunch of steps that each take 5-10 minutes or longer
| before starting traffic again - so do they just accept the
| downtime and schedule it in advance, or did I miss a detail?
| rattray wrote:
| Yeah, they architect their application to accept DB downtime -
| but I'm sure their services are still degraded to some degree
| or another during this, and they aren't clear how much total DB
| downtime they need for this (and how that time scales across
| various axes).
|
| Overall my takeaway is basically "if you want to upgrade a
| large Postgres db, you'll need like an hour of planned downtime
| and a lot of careful work" which... doesn't make me excited to
| upgrade big postgres db's in production.
| znpy wrote:
| > Overall my takeaway is basically "if you want to upgrade a
| large Postgres db, you'll need like an hour of planned
| downtime and a lot of careful work"
|
| HA is one of those things where MySQL wins hands down, sadly.
| Sadly in the sense that PostgreSQL HA still looks like a
| couple of hacks held together with duct tape, at least when
| compared to MySQL solutions.
|
| The Percona MySQL distribution has multi-master HA (Percona
| Xtradb Cluster) which is great for HA, and vanilla MySQL 8.x
| has group replication which iirc also is multi-master.
| MariaDB instead has multi-master based on Galera Cluster
| (although i'm not very knowledgeable about MariaDB).
|
| In PostgreSQL world there are many HA solutions, none of
| which is both multi-master and open source (sadly).
| rattray wrote:
| I did a little googling, and still find the answers are
| "logical replication, which is slow for large DB's" or
| "pg_upgrade, which requires nontrivial downtime".
|
| Gitlab wrote a much more in-depth rundown in 2020 of a
| pg_upgrade, which in their case took 2h of downtime and 4h
| total time: https://about.gitlab.com/blog/2020/09/11/gitlab-
| pg-upgrade/ (submitted to HN here:
| https://news.ycombinator.com/item?id=29926316). They included
| a video of the whole thing which is pretty incredible.
| gbrown_ wrote:
| Whilst others have mentioned their application is architected
| to cope with this I wanted to echo the sentiment. This post
| feels very much like a "draw the rest of the owl" meme.
| vonnieda wrote:
| > One other detail to note is that we built our software
| architecture in such a way that we can stop traffic to our
| PostgreSQL databases, queue the transactions, and run a
| PostgreSQL update without affecting payments acceptance.
| foobiekr wrote:
| That's downtime by any reasonable measure depending on the
| exact specifics. Is writing to a journal while the DB is down
| "downtime"?
|
| Site wise no but DB-wise yes.
| throwaway29303 wrote:
| (2018)
| wiz21c wrote:
| was wondering why they were using old PG's :-)
| abagheri43 wrote:
| abagheri43 wrote:
| LoungeFlyZ wrote:
| Irony. I think the HN post took down the Slony website.
| mulmen wrote:
| http://howfuckedismydatabase.com/postgres/slony.php
| leifg wrote:
| One of the things I always wonder with giant relational database
| is. How much of the "typical relational stuff" are they actually
| using?
|
| Do they have constraints on rows? Are they using views or do they
| just denormalize and duplicate? Do they use joins at all? Are
| they even doing more than 1 thing in a transaction?
| mulmen wrote:
| My only experience with databases of that size is for data
| analysis so yeah, constraints are relaxed. But even at that
| point ideas like normalization are critical to extracting
| performance out of large datasets.
|
| Normalization _is_ a performance optimization. Denormalization
| is a development shortcut. Neither is right or wrong but I
| would be surprised if a 50TB OLTP database wasn't already
| highly normalized.
|
| If it isn't then my next guess is that it could be made smaller
| or more performant if it was.
|
| We used to be proud of server uptime, back when we gave them
| names. Today if you have a server up for 900 days you're going
| to be accused of malpractice.
|
| Similar for data. We used to be proud of being able to keep
| "big" data online, but I'm no longer impressed. You're just
| spending money. Did you actually solve the business case as
| efficiently as possible given available resources? Do you
| _need_ 50TB in the first place?
| merb wrote:
| > Normalization is a performance optimization
|
| that is a stupid statement. because it's way too generic and
| it depends on the use case. read heavy data that needs a lot
| of joins are most often denormalized IF updating a lot is not
| a problem. sometimes you need to create views that pull in
| different stuff with different queries which would make them
| not really performant especially not on postgres which is
| just super slow when it comes to listing/filtering data.
| Kranar wrote:
| Normalization is pretty independent of optimization strategy
| and generally that's not presented as one of its advantages.
| Normalizing data can improve write performance but will make
| reads slower, so if you have a write intensive database then
| you will see some performance gains.
| newlisp wrote:
| Normalization improves data integrity.
| paozac wrote:
| 50TB is not so big these days. I read that in 2008 (!) Yahoo had
| a 2+ PB PG database. What is the largest you know of, 14 years
| later?
| Gigachad wrote:
| How are people dealing with databases this large? At work we
| have a mysql db with a table that has 130M records in it and a
| count(*) on the table takes 100 seconds. Anything but a simple
| look up by id is almost unworkable. I assumed this was normal
| because its too big. But am I missing something here? Are SQL
| databases capable of actually working fast at 50TB?
| itsthecourier wrote:
| Something1234 wrote:
| count(*) is always going to be slow. They don't store the
| number of live tuples, just an estimate so it's a full table
| scan. The secret is to use indexes to get down to a small bit
| that you care about. If you're filtering on 3 columns, the
| goal is to get the index to wipe out at least half the
| results you don't care about and so on and so forth.
|
| A 130M record table with no indexes is going to be crazy
| slow. Although if all you need are primary key updates, then
| that's the way to go.
| ok_dad wrote:
| Around ~2005 I took a tour of the [a well known government
| organization] and they were bragging about several-PB-sized
| databases at the time. Interestingly, there was a TON of server
| racks there in a bomb-proof building with tons of security, and
| they were all IBM servers (a supercomputer maybe?), if I
| remember correctly. Also, there was one small server rack that
| was painted differently from the rest (it looked like something
| made in-house), and we asked what it was, and the tour guide (a
| PhD computer scientist) said that technically it doesn't exist
| and he couldn't talk about it even though it was super cool.
| Now that I know what they were doing around that time (and
| probably still today) I am kinda scared at the implications of
| that tour guide's statement and what that one tiny rack was
| for. I'm glad I never went to work in their organization, since
| that tour was meant to recruit some of us a few years down the
| road.
| ddorian43 wrote:
| Was it a single server?
| GordonS wrote:
| 50TB is big. Bigger is possible I'm sure, but I'd guess
| 99.something% of all PG databases are less than 50TB.
|
| If someone here commented they had a 2PB database, I
| _guarantee_ someone else here would be like "pfft, that's not
| big"...
| aantix wrote:
| With 50TB, and if you were doing a full text search, wouldn't
| the entirety of the index have to be held in memory?
| Groxx wrote:
| it's more than big enough to cause big problems / risk days
| of downtime to change, yea. 50GB is not big. 50TB is at least
| touching big - you can do it on one physical machine if
| needed, but it's the sort of scale that benefits from bigger-
| system architecture. 50PB would be world-class big, hitting
| exciting new problems every time they do something.
| mritun wrote:
| The OP message could have better said that 50TB databases are
| common these days when single metal or 24xl I3en or I4*
| instance on AWS can hold 60T raw.
| lelandbatey wrote:
| This article mentions that a key piece of software they use in
| all this is Slony[0], software which allows replicating one
| primary/master to many secondaries/slaves, even when those
| replicas have different versions. That's pretty cool, but I'd
| like to instead draw your attention (dear reader) to the
| _phenomenal_ writing happening in Slonys description of itself.
| It describes, in plain but technical language what Slony is, then
| immediately jumps into addressing "why should you use this?" and
| does so very directly. It reiterates the problem it solves,
| mentions other potential solutions _by name_ to contextualize the
| current solution space, then explains in a nice bulleted list
| exactly the situations and niches that Slony is tailored for.
|
| Oh my heavens, I wish every piece of technology described itself
| this way!
|
| [0] - Slony -- https://www.slony.info/
| xblau wrote:
| Archived link, since it appears to be down at the moment:
| https://web.archive.org/web/20211031124351/https://www.slony...
| [deleted]
___________________________________________________________________
(page generated 2022-01-13 23:00 UTC)