[HN Gopher] PostgreSQL: No More Vacuum, No More Bloat
___________________________________________________________________
PostgreSQL: No More Vacuum, No More Bloat
Author : pella
Score : 78 points
Date : 2023-07-15 21:03 UTC (1 hours ago)
(HTM) web link (www.orioledata.com)
(TXT) w3m dump (www.orioledata.com)
| ccleve wrote:
| Is there any documentation on the "extensibility patches"? What
| did you have to do to core Postgres to get this new approach to
| work?
| oaiey wrote:
| I read object relational? Can someone enlighten me? Entity
| relational fine but what makes it object relational? Has someone
| flipped on the buzzword in the years I did not pay attention
| zetalyrae wrote:
| I looked this up the other day because I was similarly
| surprised, I think it refers to Postgres' ability to do table
| inheritance: https://www.postgresql.org/docs/current/tutorial-
| inheritance...
| feike wrote:
| PostgreSQL has used this term for decades!
|
| The oldest I can find is from 1998 (PostgreSQL 6.3), but it was
| probably in use even before.
|
| > Postgres offers substantial additional power by incorporating
| the following four additional basic concepts in such a way that
| users can easily extend the system:
|
| classes inheritance types functions
|
| Other features provide additional power and flexibility:
|
| constraints triggers rules transaction integrity
|
| These features put Postgres into the category of databases
| referred to as object-relational
|
| https://www.postgresql.org/docs/6.3/c0101.htm
| brazzy wrote:
| https://stackoverflow.com/questions/45865961/what-does-postg...
| grzm wrote:
| It's been object relational (and described as such) going _way_
| back. I think the most visible (if infrequently used) object-
| oriented feature that it has is inheritance:
| https://www.postgresql.org/docs/current/tutorial-inheritance...
| nieve wrote:
| It's because PostgreSQL has inheritance and has almost
| certainly used the term object relational since before you
| heard of it.
| glogla wrote:
| I with people would stop with the "Uber migrated from Postgres to
| MySQL" thing. Uber migrated from Postgres used as relational
| database to something that is basically their own non-relational
| database using MySQL as distributed key-value store. It is not
| really situation applicable to most users of Postgres.
|
| Anyway, this design of MVCC which moves older data into undo logs
| / segments is used by Oracle DB, so it definitely works. The
| common challenge with it is that reading older versions of data
| is slower, because you have to look it up in a log, and sometimes
| the data is removed from the log before your transactions
| finishes, getting the dreaded "Snapshot Too Old" error.
|
| E: I don't see in the article when rows get evicted from the undo
| logs. If when they are no longer needed, I'm not sure where the
| improvement comes from because it should be similar amount of
| bookkeeping? If it's a circular buffer that can ran out of space
| like Oracle does it that would mean under high write load long-
| running transactions starts to fail which is pretty unpleasant.
| paulddraper wrote:
| SQL Server avoids vacuum as well, it might be this way, I can't
| recall.
|
| And of course MySQL avoids vacuum by giving a giant middle to
| concurrency considerations.
| glogla wrote:
| I'm pretty sure SQL Server and MySQL use locking instead of
| MultiVersion Concurrency Control so they don't keep more
| copies of data around. No vacuum needed but there's a
| possibility of things blocking.
|
| But I might be out of date.
| paulddraper wrote:
| Yes, MySQL has read locks.
|
| But I don't believe SQL Server does.
| evanelias wrote:
| InnoDB (MySQL's default storage engine) implements MVCC
| using undo logging and background purge threads. It scales
| to highly concurrent OLTP workloads quite well. It doesn't
| work well with OLAP workloads / long-running transactions
| though. The oldest active transaction will block purging of
| anything newer than that transaction's snapshot.
| ruuda wrote:
| Vacuum does more than removing dead tuples though, there is still
| a need to update statistics and summarize BRINs.
| javajosh wrote:
| Yes. The (psql 15) docs are well written:
| https://www.postgresql.org/docs/15/routine-vacuuming.html
| wild_egg wrote:
| You're thinking of ANALYZE which is a separate operation that's
| commonly run during vacuuming but can be invoked independently
| rickette wrote:
| The article contains a link with the rather curious title "10
| things that Richard Branson hates about PostgreSQL".... Turns out
| the guy who wrote that blog is called Rick Branson, not Richard.
| rcme wrote:
| Rick is a nickname for Richard.
| daenney wrote:
| I love the whole "2.3x less CPU overhead per transaction" where
| Postgres scales from 5% to 65% CPU usage and Oriole sits
| constantly at 90%. That doesn't seem like a huge success to me?
| The predictability sure is nice, but moving the lower end up by
| 85% is something I'd be rather worried about
| waterproof wrote:
| Eyeballing the tps graph, OrioleDB is doing 5x tps while using
| 2x the CPU. So about 5/2=2.5x the CPU per transaction.
|
| Checks out.
| gary_0 wrote:
| 5x tps with 2x CPU is 2/5 = 0.4x the CPU (ie. it's more
| efficient per transaction).
| acjohnson55 wrote:
| You generally want to keep your CPU fully utilized. It looks
| like Oriole is doing significantly more transactions and is
| CPU-bound, due to much lower IO requirements. The good news is
| that it implies you could get even more performance out of
| Oriole by vertically scaling to a more powerful CPU, whereas
| Postgres would not continue to increase in performance this
| way.
|
| Those idle times on the Postgres server _could_ be used for
| something else, if you 're thinking in a desktop OS mindset.
| But for servers, you tend to want machines that are doing one
| thing and are optimized for that thing.
| raggi wrote:
| It's hard to generalize on these points. In a situation where
| the throughput was inverted but the proportional system usage
| was the same, you would instead say "you can still vertically
| scale by adding more disks", rather than saying adding bigger
| cpu. It's not meaningful in isolation.
|
| It may be reasonable to suggest that for a new code base that
| is cpu bound there's a good chance there is low hanging fruit
| for cpu optimizations that may further increase the
| throughput gap. It's also the case however that the prior
| engines tuning starting life on much older computer
| architectures, drastically different proportional syscall
| costs and so on, it very often means that there's low hanging
| fruit in configuration to improve baseline benchmarks such as
| these. High io time suggests poor caching which in many
| scenarios you'd consider a suboptimal deployed configuration.
|
| It's not just the devil that's in the details, it's
| everything.
| acjohnson55 wrote:
| That makes sense. I'm mostly just trying to explain the
| counterintuitive reason that the high CPU usage shouldn't
| be interpreted as a flaw.
| raggi wrote:
| To be a little more clear on what the detail of the
| benchmark in question is: it's a benchmark that explicitly
| exercises a pathological use case for postgresqls current
| design, one that nonetheless functions, and demonstrates
| that the advertised engine does not have that pathology. A
| key takeaway should probably be, if you're a Postgres user:
| if your workload looks exactly like this (sparse upserts
| into a large data set at a high rate) then you might want
| to evaluate your the runway of your architecture before the
| geometric costs or latency stalls become relevant - just as
| for cost analysis of any other system. What is somewhat
| interesting in this article, and not super clearly
| presented, is that this workload is actually fairly
| pathological for most existing engines offering this set of
| structural and query facilities, and that's interesting, if
| this is the niche you need. Most people do some amount of
| this, but not always at a super high rate, and there are
| ways to get the same effective writable/readable data using
| a different schema, while avoiding it. Nice thing here is
| you can do the one-liner version.
| gary_0 wrote:
| > you can still vertically scale by adding more disks
|
| Parallelizing IO is a lot different from scaling up CPU
| power, though. I'd imagine DB server IO performance has a
| lot less lower-hanging fruit than CPU/software performance.
| tanelpoder wrote:
| > You generally want to keep your CPU fully utilized.
|
| Not in real life concurrent systems where latency matters. In
| addition to the queuing/random request arrival rate reasons,
| all kinds of funky latency hiccups start happening both at
| the DB and OS level when you run your CPU _average_
| utilization near 100%. Spinlocks, priority inversion, etc.
| Some bugs show up that don't manifest when running with lower
| CPU utilization etc.
| adsharma wrote:
| It's not clear if the CPU cost per tx is any worse. Was
| OrioleDB doing 5x the transactions at this CPU usage?
| [deleted]
| jklehm wrote:
| My read is that it's at 90% because they are saturating the CPU
| to that point with the TPS threshold they use for comparison,
| the TPS of Oriole is constant and way higher than pg in these
| charts at least.
|
| I'd think the CPU will drop proportionally to the TPS, they
| just want to show how high it can go here.
| pella wrote:
| With the same equipment, your performance is now five times
| better. (5X higher TPS) We need to test again with more
| hardware, but if you can maintain 3 times the performance at
| the lower end, it could be a good alternative for some users.
|
| _" As the cumulative result of the improvements discussed
| above, OrioleDB provides:
|
| - 5X higher TPS,
|
| - 2.3X less CPU load per transaction,
|
| - 22X less IOPS per transaction,
|
| - No table and index bloat."_
| pella wrote:
| simple OrioleDB docker build tutorial :
|
| https://github.com/orioledb/orioledb/blob/main/doc/docker_us...
| mynonameaccount wrote:
| [dead]
___________________________________________________________________
(page generated 2023-07-15 23:00 UTC)