[HN Gopher] The part of Postgres we hate the most: Multi-version...
       ___________________________________________________________________
        
       The part of Postgres we hate the most: Multi-version concurrency
       control
        
       Author : andrenotgiant
       Score  : 103 points
       Date   : 2023-04-26 17:10 UTC (5 hours ago)
        
 (HTM) web link (ottertune.com)
 (TXT) w3m dump (ottertune.com)
        
       | audioheavy wrote:
       | My main takeaway from this article: as popular as Postgres and
       | MySQL are, and understanding the legacy systems built for them,
       | it will always require deep expertise and "black magic" to
       | achieve enough performance and scale for hyper scale use cases.
       | It justifies the (current) trend to have DB's built for
       | distributed tx/writes/reads that you don't have to become a
       | surgeon to scale. There are other DBs and DBaaS that, although
       | not OSS, have solved this problem in a more cost-efficient way
       | than having a team of surgeons.
        
         | zie wrote:
         | I would argue, you handle the hyper-scale use case when you are
         | actually in hyper-scale. Trying to pre-maturely optimize this
         | is almost always a waste of time and chances are you will screw
         | it up anyway. Almost nobody gets to that scale anyway. If you
         | do get to that scale, you have the money and resources to fix
         | the problem(s) at that time.
        
       | zinclozenge wrote:
       | I guess the question is, which MVCC strategy would be the "right"
       | one to pick for a modern relational database? The paper linked
       | focuses on main memory databases, and being main memory allows
       | you to do things you can't do when disk based.
        
       | garyclarke27 wrote:
       | Clever Clickbait - Of course at the end of the article they offer
       | a solution - their product (and of course it's AI enhanced) to
       | the problem they have overhyped.
        
         | Scarbutt wrote:
         | and rhetoric:
         | 
         |  _So how does one work around PostgreSQL's quirks? Well, you
         | can spend an enormous amount of time and effort tuning it
         | yourself. Good luck with that._
        
         | apavlo wrote:
         | > Of course at the end of the article they offer a solution -
         | their product (and of course it's AI enhanced)
         | 
         | We have been working on automatic database optimization using
         | AI/ML for a decade at Carnegie Mellon University [1][2]. This
         | is not a gimmick. Furthermore, as you can see from the many
         | comments here, the problem is not overhyped.
         | 
         | [1] https://db.cs.cmu.edu/projects/ottertune/
         | 
         | [2] https://db.cs.cmu.edu/projects/noisepage/
        
         | dang wrote:
         | I asked them to take that bit out at the end and it looks like
         | they did.
         | 
         | General remark for startups wanting attention on HN: it's not
         | good to end an interesting article with a call-to-action that
         | makes your article feel like an ad. Readers who read to the end
         | experience that as a bait-and-switch and end up feeling
         | betrayed.
         | 
         | What works much better is to disclose right up front what your
         | startup is and how it's related to the article content. Once
         | you've gotten that out of the way, the reader can then dive
         | into the (hopefully) interesting content and end the article on
         | a satisfying note.
         | 
         | Btw, I have a set of notes on how to write for HN that I'm
         | working (slowly) on turning into an essay. If anyone wants a
         | copy, email me at hn@ycombinator.com and I'll be happy to send
         | it. It includes the above point and a bunch more.
        
           | eduction wrote:
           | I think it's disturbing you asked someone to change their
           | content and even more disturbing that they complied. You are
           | experienced at moderating Havker News have no business being
           | a global censor for content out in the world. This sucks.
        
         | riku_iki wrote:
         | > and of course it's AI enhanced
         | 
         | did they mention LLM/ChatGPT?..
        
           | luhn wrote:
           | In a previous version of the article they concluded by
           | pitching their "AI-powered cloud database tuning" product.
        
           | BohanOT wrote:
           | I don't think so
        
         | joevandyk wrote:
         | I've personally ran into the problems mentioned in the article
         | many times, unsure it's "overhyped".
        
           | spprashant wrote:
           | It's a problem, but not an AI problem. It has a clear cause
           | and obvious mitigation strategies.
        
       | kerblang wrote:
       | Question: Why would I need more than one extra version of the
       | same row? I would think that with transactional locking everybody
       | else is waiting on the first update to commit before getting
       | their own changes in, unless the db is somehow trying to lock
       | columns-per-row instead of entire rows.
        
       | dale_glass wrote:
       | That's interesting, MVCC was the thing that drew me to Postgres
       | to begin with!
       | 
       | Way back I was working on an in-house inventory app written in
       | Visual Basic against SQL Server 2000, I think. That one just put
       | locks on tables. It had the "charming" characteristic of that if
       | you weren't very, very careful with Enterprise Manager, loading a
       | table in the GUI put a lock on it and just keep on holding it
       | until that window was closed.
       | 
       | Then the running app would eventually snag on that lock, maybe
       | keep holding some other lock that something else would snag on,
       | and 5 minutes later I'd hear one of the operators screaming
       | "Nothing is working! I can't take any orders!" from the room next
       | to me.
        
         | miohtama wrote:
         | MVCC and optimistic concurrency control are very pleasant to
         | work with for anyone who spent a decade with manually locking
         | SQL databases. It takes the human error and developer mistakes
         | away from the process, or protect against it. You can still
         | slow down your queries with deadlocks, but at least you cannot
         | corrupt your data by accident.
         | 
         | Though, any other optimistic concurrency control scheme can be
         | better, but PostgreSQL was at the right place at the right time
         | when people started to leave from MySQL.
        
       | no_wizard wrote:
       | Can the MVCC implementation be swapped via Postgres extensions?
        
         | apavlo wrote:
         | No. It would be a major surgery on the internals. See the
         | article for my comment at the attempt to do this with the Zheap
         | project:
         | 
         | https://wiki.postgresql.org/wiki/Zheap
        
         | rch wrote:
         | Depending on the use case, I'd consider a foreign data wrapper.
        
         | BohanOT wrote:
         | I think there is a new project from the Postgres community.
         | They try to replace the storage engine to solve the
         | inefficiency caused by MVCC
         | 
         | https://github.com/orioledb/orioledb
        
       | mmaunder wrote:
       | I must admit as a web practitioner since 1994 I have a bit of an
       | issue with this:
       | 
       | > In the 2000s, the conventional wisdom selected MySQL because
       | rising tech stars like Google and Facebook were using it. Then in
       | the 2010s, it was MongoDB because non-durable writes made it
       | "webscale". In the last five years, PostgreSQL has become the
       | Internet's darling DBMS. And for good reasons!
       | 
       | Different DB's, different strengths and it's not a zero sum came
       | as implied. MySQL was popular before Google was born - we used it
       | heavily at eToys in the 90s for massive transaction volume and
       | replacing it with Oracle was one of the reasons for the
       | catastrophic failure of eToys circa 2001. MongoDB gained traction
       | not because it's an alternative to MySQL or PostgreSQL. And
       | PostgreSQL's marketshare today is on a par with Mongo and both
       | are dwarfed by MySQL which IMO is the true darling of web DB's
       | given it's global popularity.
        
         | LambdaComplex wrote:
         | > in the 2010s, it was MongoDB because non-durable writes made
         | it "webscale"
         | 
         | I think this is the best video on that topic:
         | https://www.youtube.com/watch?v=b2F-DItXtZs
        
         | p_l wrote:
         | A non-trivial component to MySQL popularity was that easy
         | installation (not necessarily administration) and comparatively
         | low resource usage with good performance _at default settings_
         | (even today one needs to run some basic calculations for
         | postgres in production, IMO) meant that cheapest possible
         | dynamic hosting using Linux, Apache, PHP3, and MySQL 3, was
         | what simply was the only available option for many. This
         | codified LAMP stack, people learned from tutorials
         | /courses/word of mouth how to write web apps with PHP and
         | MySQL, used cheap LAMP hosting, optionally installed LAMP
         | servers themselves, etc.
         | 
         | This also led to popularity of bigger reselling setups (I don't
         | miss installing cpanel...) and services like Dreamhost.
         | 
         | MySQL in this way gained a virtuous cycle completely unrelated
         | to Google. Hell, most people I know, who dealt with LAMP space
         | for years, never knew Google had anything to do with MySQL
         | (most people that knew about it were... Lispers. Because of who
         | built the first version of Google Ads)
         | 
         | Even Mac OS X Server shipped with MySQL and PHP because of
         | that, in 2001.
        
       | fdr wrote:
       | One of the weird things about Postgres MVCC is that it is
       | "optimized for rollback," as one person memorably quipped to me.
       | This is not to imply a design principle, it's more a description
       | of how things ended up, and the general argument behind this quip
       | is Postgres lacks "UNDO" segments.
       | 
       | On the one hand, this does make the model Postgres uses admirably
       | simple: the WAL is all "REDO," and the heap is all you need to
       | accomplish any kind of read, but at the expense that stuff that
       | normally would be copied off to a sequential UNDO log and then
       | vaporized when the transaction commits and all possible readers
       | have exited remains comingled with everything else in the main
       | database heap, needing to be fished out again by VACUUM for
       | purging and figuring out how to reclaim numerical space for more
       | transactions.
       | 
       | There may be other solutions to this, but it's one unusual
       | quality Postgres has relative to other MVCC databases, many of
       | which sport an UNDO log.
       | 
       | There are downsides to UNDO, however: if a read needs an old copy
       | of the tuple, it needs to fish around in UNDO, all the indices
       | and synchronization need to account for this, and if there's a
       | rollback or crash recovery event (i.e. mass-rollback of all
       | transactions open at the time), everything has to be shuffled
       | back into the main database storage. Hence the memorable initial
       | comment: "Postgres is optimized for rollback."
        
       | Ym9oYW4 wrote:
       | So why Postgres chooses the worst MVCC design compared to MySQL
       | and Oracle? Is this because of legacy reasons or other factors?
        
         | apavlo wrote:
         | Legacy reasons. The idea was that you wouldn't need a WAL
         | because the table itself is the log. And then you could support
         | time-travel queries if you never cleaned up the expired tuples.
        
           | drkp wrote:
           | _And_ it wasn't even originally designed to be used for
           | concurrency control at all...
        
           | riku_iki wrote:
           | Is MVCC actually superior by some other considerations? Less
           | lock contentions, transactional DML.
        
             | masklinn wrote:
             | The problem is not mvcc but postgres' implementation
             | details of it.
        
               | Max-Ganz-II wrote:
               | In what way? I didn't see anything obviously improper
               | when I learned how serialization isolation worked.
        
       | spprashant wrote:
       | This post has a valid point. But the last line makes it clear why
       | they care so much about it.
       | 
       | Yeah, table bloat and transaction ID wraparounds are terrible,
       | but easily avoidable if you follow a few simple guidelines.
       | Typically in my experience, best way to avoid these issues are to
       | set sensible vacuum settings and track long running queries.
       | 
       | I do hate the some of the defaults in the Postgres configuration
       | are too conservative for most workloads.
        
         | nextaccountic wrote:
         | What last line? The literal last line is "We'll cover more
         | about what we can do in our next article."
         | 
         | Do you mean this one?
         | 
         | > At OtterTune, we see this problem often in our customers'
         | databases. One PostgreSQL RDS instance had a long-running query
         | caused by stale statistics after bulk insertions. This query
         | blocked the autovacuum from updating the statistics, resulting
         | in more long-running queries. OtterTune's automated health
         | checks identified the problem, but the administrator still had
         | to kill the query manually and run ANALYZE after bulk
         | insertions. The good news is that the long query's execution
         | time went from 52 minutes to just 34 seconds.
        
         | spudlyo wrote:
         | > "But making sure that PostgreSQL's autovacuum is running as
         | best as possible is difficult due to its complexity."
         | 
         | The problem, as the article states it, is that a "sensible"
         | vacuum setting for one table is a terrible setting for another
         | depending on how large these tables are. On a 100 million tuple
         | table you'd be waiting 'til there there were 20 million garbage
         | tuples before taking action.
        
         | riku_iki wrote:
         | > I do hate the some of the defaults in the Postgres
         | configuration are too conservative for most workloads.
         | 
         | it is also black magic to tune them.
        
         | LunaSea wrote:
         | Paying an overhead cost of 53 bytes per row is also too
         | expensive for MVCC in my opinion.
        
       | elijaht wrote:
       | > Another problem with the autovacuum in PostgreSQL is that it
       | may get blocked by long-running transactions, which can result in
       | the accumulation of more dead tuples and stale statistics.
       | Failing to clean expired versions in a timely manner leads to
       | numerous performance problems, causing more long-running
       | transactions that block the autovacuum process. It becomes a
       | vicious cycle, requiring humans to intervene manually by killing
       | long-running transactions.
       | 
       | Oh man, a previous company I worked at had an issue with a hot
       | table (frequent reads + writes) interfering with autovacuum. Many
       | fires over a six month period arose from all of that. I was
       | (luckily) only on an adjacent team, so I don't know the details,
       | other than vacuums taking over 24 hours! I'm sure it could have
       | been prevented, but it seemed horrible to debug
        
         | fdr wrote:
         | yeah, this is called "cancellation." Autovacuum is very polite
         | and tries to let go of a lock when there's a conflict. So it
         | lets go, over and over, until it triggers a heuristic deciding
         | "no, not succeeding in this session could be dangerous!" and
         | then people begin to notice it.
         | 
         | Last I checked (....a few years ago, so things may have
         | changed,) the theory of autovacuum heuristics may not have
         | changed much since the turn of the millennium, they're probably
         | about due.
        
       | vp8989 wrote:
       | Am I correct in thinking that PG's MVCC implementation results in
       | a worse story around offloading some mild OLAP workloads to a
       | replica without affecting the primary? Anecdotally, it _seems_
       | that MySQL handles this better but I don 't understand the
       | internals of both enough to explain why that is.
       | 
       | https://aws.amazon.com/blogs/database/manage-long-running-re...
        
       | cyberax wrote:
       | Yup. A lot of heavy users of Postgres eventually hit the same
       | barrier. Here's another take from Uber:
       | https://www.uber.com/blog/postgres-to-mysql-migration/
       | 
       | I had a similar personal experience. In my previous job we used
       | Postgres to implement a task queuing system, and it created a
       | major bottleneck, resulting in tons of concurrency failures and
       | bloat.
       | 
       | And most dangerously, the system failed catastrophically under
       | load. As the load increased, most transactions ended up in
       | concurrent failures, so very little actual work got committed.
       | This increased the amount of outstanding tasks, resulting in even
       | higher rate of concurrent failures.
       | 
       | And this can happen suddenly, one moment the system behaves well,
       | with tasks being processed at a good rate, and the next moment
       | the queue blows up and nothing works.
       | 
       | I re-implemented this system using pessimistic locking, and it
       | turned out to work much better. Even under very high load, the
       | system could still make forward progress.
       | 
       | The downside was having to make sure that no deadlocks can
       | happen.
        
       | Max-Ganz-II wrote:
       | MVCC for Amazon Redshift;
       | 
       | (pdf)
       | https://www.redshiftresearchproject.org/white_papers/downloa...
       | 
       | (html)
       | https://www.redshiftresearchproject.org/white_papers/downloa...
       | 
       | I've been told, very kindly, by a couple of people that it's the
       | best explanation they've ever seen. I'd like to get more eyes on
       | it, to pick up any mistakes, and it might be useful in and of
       | itself anyway to reader, as MVCC on Redshift is I believe the
       | same as MVCC was on Postgres before snapshot isolation.
        
         | anecdotal1 wrote:
         | Not bad but I like this one too
         | 
         | http://www.interdb.jp/pg/pgsql05.html
        
         | gregw2 wrote:
         | This paper, at least by my skimming, seems to describe
         | Redshift's historic SERIALIZABLE ISOLATION level, but does not
         | mention Redshift's newer SNAPSHOT ISOLATION capability.
         | 
         | https://aws.amazon.com/about-aws/whats-new/2022/05/amazon-re...
         | 
         | For concurrency scalability, AWS now configures SNAPSHOT
         | ISOLATION by default if you use Redshift Serverless but non-
         | serverless still defaults to SERIALIZABLE ISOLATION.
        
           | Max-Ganz-II wrote:
           | Yes. I intended to write exactly this at the end of my post,
           | but I managed to word it completely wrongly. The document
           | describes MVCC as it has been in Redshift until about a year
           | ago, when snapshot isolation was introduced.
        
       | h1fra wrote:
       | Once you figured out all the point in this article, it's a matter
       | of fine tuning, can take some times but eventually it will works.
       | The only thing I still struggle with is the Table Bloat.
       | 
       | On managed Postgres (i.e: gcp, aws) you pay for the disk, but
       | when you can't run a VACUUM FULL because it locks the table, you
       | end up with a lot of allocated storage for nothing and you can't
       | shrink the disk size (at least on gcp). Storage is cheap but
       | still feels like a waste.
        
         | fovc wrote:
         | https://reorg.github.io/pg_repack/
         | 
         | Dead easy to run and no long-held locks
        
       ___________________________________________________________________
       (page generated 2023-04-26 23:00 UTC)