[HN Gopher] The part of Postgres we hate the most: Multi-version...
___________________________________________________________________
The part of Postgres we hate the most: Multi-version concurrency
control
Author : andrenotgiant
Score : 103 points
Date : 2023-04-26 17:10 UTC (5 hours ago)
(HTM) web link (ottertune.com)
(TXT) w3m dump (ottertune.com)
| audioheavy wrote:
| My main takeaway from this article: as popular as Postgres and
| MySQL are, and understanding the legacy systems built for them,
| it will always require deep expertise and "black magic" to
| achieve enough performance and scale for hyper scale use cases.
| It justifies the (current) trend to have DB's built for
| distributed tx/writes/reads that you don't have to become a
| surgeon to scale. There are other DBs and DBaaS that, although
| not OSS, have solved this problem in a more cost-efficient way
| than having a team of surgeons.
| zie wrote:
| I would argue, you handle the hyper-scale use case when you are
| actually in hyper-scale. Trying to pre-maturely optimize this
| is almost always a waste of time and chances are you will screw
| it up anyway. Almost nobody gets to that scale anyway. If you
| do get to that scale, you have the money and resources to fix
| the problem(s) at that time.
| zinclozenge wrote:
| I guess the question is, which MVCC strategy would be the "right"
| one to pick for a modern relational database? The paper linked
| focuses on main memory databases, and being main memory allows
| you to do things you can't do when disk based.
| garyclarke27 wrote:
| Clever Clickbait - Of course at the end of the article they offer
| a solution - their product (and of course it's AI enhanced) to
| the problem they have overhyped.
| Scarbutt wrote:
| and rhetoric:
|
| _So how does one work around PostgreSQL's quirks? Well, you
| can spend an enormous amount of time and effort tuning it
| yourself. Good luck with that._
| apavlo wrote:
| > Of course at the end of the article they offer a solution -
| their product (and of course it's AI enhanced)
|
| We have been working on automatic database optimization using
| AI/ML for a decade at Carnegie Mellon University [1][2]. This
| is not a gimmick. Furthermore, as you can see from the many
| comments here, the problem is not overhyped.
|
| [1] https://db.cs.cmu.edu/projects/ottertune/
|
| [2] https://db.cs.cmu.edu/projects/noisepage/
| dang wrote:
| I asked them to take that bit out at the end and it looks like
| they did.
|
| General remark for startups wanting attention on HN: it's not
| good to end an interesting article with a call-to-action that
| makes your article feel like an ad. Readers who read to the end
| experience that as a bait-and-switch and end up feeling
| betrayed.
|
| What works much better is to disclose right up front what your
| startup is and how it's related to the article content. Once
| you've gotten that out of the way, the reader can then dive
| into the (hopefully) interesting content and end the article on
| a satisfying note.
|
| Btw, I have a set of notes on how to write for HN that I'm
| working (slowly) on turning into an essay. If anyone wants a
| copy, email me at hn@ycombinator.com and I'll be happy to send
| it. It includes the above point and a bunch more.
| eduction wrote:
| I think it's disturbing you asked someone to change their
| content and even more disturbing that they complied. You are
| experienced at moderating Havker News have no business being
| a global censor for content out in the world. This sucks.
| riku_iki wrote:
| > and of course it's AI enhanced
|
| did they mention LLM/ChatGPT?..
| luhn wrote:
| In a previous version of the article they concluded by
| pitching their "AI-powered cloud database tuning" product.
| BohanOT wrote:
| I don't think so
| joevandyk wrote:
| I've personally ran into the problems mentioned in the article
| many times, unsure it's "overhyped".
| spprashant wrote:
| It's a problem, but not an AI problem. It has a clear cause
| and obvious mitigation strategies.
| kerblang wrote:
| Question: Why would I need more than one extra version of the
| same row? I would think that with transactional locking everybody
| else is waiting on the first update to commit before getting
| their own changes in, unless the db is somehow trying to lock
| columns-per-row instead of entire rows.
| dale_glass wrote:
| That's interesting, MVCC was the thing that drew me to Postgres
| to begin with!
|
| Way back I was working on an in-house inventory app written in
| Visual Basic against SQL Server 2000, I think. That one just put
| locks on tables. It had the "charming" characteristic of that if
| you weren't very, very careful with Enterprise Manager, loading a
| table in the GUI put a lock on it and just keep on holding it
| until that window was closed.
|
| Then the running app would eventually snag on that lock, maybe
| keep holding some other lock that something else would snag on,
| and 5 minutes later I'd hear one of the operators screaming
| "Nothing is working! I can't take any orders!" from the room next
| to me.
| miohtama wrote:
| MVCC and optimistic concurrency control are very pleasant to
| work with for anyone who spent a decade with manually locking
| SQL databases. It takes the human error and developer mistakes
| away from the process, or protect against it. You can still
| slow down your queries with deadlocks, but at least you cannot
| corrupt your data by accident.
|
| Though, any other optimistic concurrency control scheme can be
| better, but PostgreSQL was at the right place at the right time
| when people started to leave from MySQL.
| no_wizard wrote:
| Can the MVCC implementation be swapped via Postgres extensions?
| apavlo wrote:
| No. It would be a major surgery on the internals. See the
| article for my comment at the attempt to do this with the Zheap
| project:
|
| https://wiki.postgresql.org/wiki/Zheap
| rch wrote:
| Depending on the use case, I'd consider a foreign data wrapper.
| BohanOT wrote:
| I think there is a new project from the Postgres community.
| They try to replace the storage engine to solve the
| inefficiency caused by MVCC
|
| https://github.com/orioledb/orioledb
| mmaunder wrote:
| I must admit as a web practitioner since 1994 I have a bit of an
| issue with this:
|
| > In the 2000s, the conventional wisdom selected MySQL because
| rising tech stars like Google and Facebook were using it. Then in
| the 2010s, it was MongoDB because non-durable writes made it
| "webscale". In the last five years, PostgreSQL has become the
| Internet's darling DBMS. And for good reasons!
|
| Different DB's, different strengths and it's not a zero sum came
| as implied. MySQL was popular before Google was born - we used it
| heavily at eToys in the 90s for massive transaction volume and
| replacing it with Oracle was one of the reasons for the
| catastrophic failure of eToys circa 2001. MongoDB gained traction
| not because it's an alternative to MySQL or PostgreSQL. And
| PostgreSQL's marketshare today is on a par with Mongo and both
| are dwarfed by MySQL which IMO is the true darling of web DB's
| given it's global popularity.
| LambdaComplex wrote:
| > in the 2010s, it was MongoDB because non-durable writes made
| it "webscale"
|
| I think this is the best video on that topic:
| https://www.youtube.com/watch?v=b2F-DItXtZs
| p_l wrote:
| A non-trivial component to MySQL popularity was that easy
| installation (not necessarily administration) and comparatively
| low resource usage with good performance _at default settings_
| (even today one needs to run some basic calculations for
| postgres in production, IMO) meant that cheapest possible
| dynamic hosting using Linux, Apache, PHP3, and MySQL 3, was
| what simply was the only available option for many. This
| codified LAMP stack, people learned from tutorials
| /courses/word of mouth how to write web apps with PHP and
| MySQL, used cheap LAMP hosting, optionally installed LAMP
| servers themselves, etc.
|
| This also led to popularity of bigger reselling setups (I don't
| miss installing cpanel...) and services like Dreamhost.
|
| MySQL in this way gained a virtuous cycle completely unrelated
| to Google. Hell, most people I know, who dealt with LAMP space
| for years, never knew Google had anything to do with MySQL
| (most people that knew about it were... Lispers. Because of who
| built the first version of Google Ads)
|
| Even Mac OS X Server shipped with MySQL and PHP because of
| that, in 2001.
| fdr wrote:
| One of the weird things about Postgres MVCC is that it is
| "optimized for rollback," as one person memorably quipped to me.
| This is not to imply a design principle, it's more a description
| of how things ended up, and the general argument behind this quip
| is Postgres lacks "UNDO" segments.
|
| On the one hand, this does make the model Postgres uses admirably
| simple: the WAL is all "REDO," and the heap is all you need to
| accomplish any kind of read, but at the expense that stuff that
| normally would be copied off to a sequential UNDO log and then
| vaporized when the transaction commits and all possible readers
| have exited remains comingled with everything else in the main
| database heap, needing to be fished out again by VACUUM for
| purging and figuring out how to reclaim numerical space for more
| transactions.
|
| There may be other solutions to this, but it's one unusual
| quality Postgres has relative to other MVCC databases, many of
| which sport an UNDO log.
|
| There are downsides to UNDO, however: if a read needs an old copy
| of the tuple, it needs to fish around in UNDO, all the indices
| and synchronization need to account for this, and if there's a
| rollback or crash recovery event (i.e. mass-rollback of all
| transactions open at the time), everything has to be shuffled
| back into the main database storage. Hence the memorable initial
| comment: "Postgres is optimized for rollback."
| Ym9oYW4 wrote:
| So why Postgres chooses the worst MVCC design compared to MySQL
| and Oracle? Is this because of legacy reasons or other factors?
| apavlo wrote:
| Legacy reasons. The idea was that you wouldn't need a WAL
| because the table itself is the log. And then you could support
| time-travel queries if you never cleaned up the expired tuples.
| drkp wrote:
| _And_ it wasn't even originally designed to be used for
| concurrency control at all...
| riku_iki wrote:
| Is MVCC actually superior by some other considerations? Less
| lock contentions, transactional DML.
| masklinn wrote:
| The problem is not mvcc but postgres' implementation
| details of it.
| Max-Ganz-II wrote:
| In what way? I didn't see anything obviously improper
| when I learned how serialization isolation worked.
| spprashant wrote:
| This post has a valid point. But the last line makes it clear why
| they care so much about it.
|
| Yeah, table bloat and transaction ID wraparounds are terrible,
| but easily avoidable if you follow a few simple guidelines.
| Typically in my experience, best way to avoid these issues are to
| set sensible vacuum settings and track long running queries.
|
| I do hate the some of the defaults in the Postgres configuration
| are too conservative for most workloads.
| nextaccountic wrote:
| What last line? The literal last line is "We'll cover more
| about what we can do in our next article."
|
| Do you mean this one?
|
| > At OtterTune, we see this problem often in our customers'
| databases. One PostgreSQL RDS instance had a long-running query
| caused by stale statistics after bulk insertions. This query
| blocked the autovacuum from updating the statistics, resulting
| in more long-running queries. OtterTune's automated health
| checks identified the problem, but the administrator still had
| to kill the query manually and run ANALYZE after bulk
| insertions. The good news is that the long query's execution
| time went from 52 minutes to just 34 seconds.
| spudlyo wrote:
| > "But making sure that PostgreSQL's autovacuum is running as
| best as possible is difficult due to its complexity."
|
| The problem, as the article states it, is that a "sensible"
| vacuum setting for one table is a terrible setting for another
| depending on how large these tables are. On a 100 million tuple
| table you'd be waiting 'til there there were 20 million garbage
| tuples before taking action.
| riku_iki wrote:
| > I do hate the some of the defaults in the Postgres
| configuration are too conservative for most workloads.
|
| it is also black magic to tune them.
| LunaSea wrote:
| Paying an overhead cost of 53 bytes per row is also too
| expensive for MVCC in my opinion.
| elijaht wrote:
| > Another problem with the autovacuum in PostgreSQL is that it
| may get blocked by long-running transactions, which can result in
| the accumulation of more dead tuples and stale statistics.
| Failing to clean expired versions in a timely manner leads to
| numerous performance problems, causing more long-running
| transactions that block the autovacuum process. It becomes a
| vicious cycle, requiring humans to intervene manually by killing
| long-running transactions.
|
| Oh man, a previous company I worked at had an issue with a hot
| table (frequent reads + writes) interfering with autovacuum. Many
| fires over a six month period arose from all of that. I was
| (luckily) only on an adjacent team, so I don't know the details,
| other than vacuums taking over 24 hours! I'm sure it could have
| been prevented, but it seemed horrible to debug
| fdr wrote:
| yeah, this is called "cancellation." Autovacuum is very polite
| and tries to let go of a lock when there's a conflict. So it
| lets go, over and over, until it triggers a heuristic deciding
| "no, not succeeding in this session could be dangerous!" and
| then people begin to notice it.
|
| Last I checked (....a few years ago, so things may have
| changed,) the theory of autovacuum heuristics may not have
| changed much since the turn of the millennium, they're probably
| about due.
| vp8989 wrote:
| Am I correct in thinking that PG's MVCC implementation results in
| a worse story around offloading some mild OLAP workloads to a
| replica without affecting the primary? Anecdotally, it _seems_
| that MySQL handles this better but I don 't understand the
| internals of both enough to explain why that is.
|
| https://aws.amazon.com/blogs/database/manage-long-running-re...
| cyberax wrote:
| Yup. A lot of heavy users of Postgres eventually hit the same
| barrier. Here's another take from Uber:
| https://www.uber.com/blog/postgres-to-mysql-migration/
|
| I had a similar personal experience. In my previous job we used
| Postgres to implement a task queuing system, and it created a
| major bottleneck, resulting in tons of concurrency failures and
| bloat.
|
| And most dangerously, the system failed catastrophically under
| load. As the load increased, most transactions ended up in
| concurrent failures, so very little actual work got committed.
| This increased the amount of outstanding tasks, resulting in even
| higher rate of concurrent failures.
|
| And this can happen suddenly, one moment the system behaves well,
| with tasks being processed at a good rate, and the next moment
| the queue blows up and nothing works.
|
| I re-implemented this system using pessimistic locking, and it
| turned out to work much better. Even under very high load, the
| system could still make forward progress.
|
| The downside was having to make sure that no deadlocks can
| happen.
| Max-Ganz-II wrote:
| MVCC for Amazon Redshift;
|
| (pdf)
| https://www.redshiftresearchproject.org/white_papers/downloa...
|
| (html)
| https://www.redshiftresearchproject.org/white_papers/downloa...
|
| I've been told, very kindly, by a couple of people that it's the
| best explanation they've ever seen. I'd like to get more eyes on
| it, to pick up any mistakes, and it might be useful in and of
| itself anyway to reader, as MVCC on Redshift is I believe the
| same as MVCC was on Postgres before snapshot isolation.
| anecdotal1 wrote:
| Not bad but I like this one too
|
| http://www.interdb.jp/pg/pgsql05.html
| gregw2 wrote:
| This paper, at least by my skimming, seems to describe
| Redshift's historic SERIALIZABLE ISOLATION level, but does not
| mention Redshift's newer SNAPSHOT ISOLATION capability.
|
| https://aws.amazon.com/about-aws/whats-new/2022/05/amazon-re...
|
| For concurrency scalability, AWS now configures SNAPSHOT
| ISOLATION by default if you use Redshift Serverless but non-
| serverless still defaults to SERIALIZABLE ISOLATION.
| Max-Ganz-II wrote:
| Yes. I intended to write exactly this at the end of my post,
| but I managed to word it completely wrongly. The document
| describes MVCC as it has been in Redshift until about a year
| ago, when snapshot isolation was introduced.
| h1fra wrote:
| Once you figured out all the point in this article, it's a matter
| of fine tuning, can take some times but eventually it will works.
| The only thing I still struggle with is the Table Bloat.
|
| On managed Postgres (i.e: gcp, aws) you pay for the disk, but
| when you can't run a VACUUM FULL because it locks the table, you
| end up with a lot of allocated storage for nothing and you can't
| shrink the disk size (at least on gcp). Storage is cheap but
| still feels like a waste.
| fovc wrote:
| https://reorg.github.io/pg_repack/
|
| Dead easy to run and no long-held locks
___________________________________________________________________
(page generated 2023-04-26 23:00 UTC)