[HN Gopher] An early look at Postgres 14: Performance and monito...
       ___________________________________________________________________
        
       An early look at Postgres 14: Performance and monitoring
       Improvements
        
       Author : bananaoomarang
       Score  : 274 points
       Date   : 2021-05-22 15:47 UTC (7 hours ago)
        
 (HTM) web link (pganalyze.com)
 (TXT) w3m dump (pganalyze.com)
        
       | lmarcos wrote:
       | All I want is to be able to use Postgres in production without
       | the need of pgbouncer.
        
         | matsemann wrote:
         | Never had the use for it or even heard of it, guess it depends
         | on usage patterns? I've mostly worked with longlived java
         | servers, and there having an internal db pool has been standard
         | since forever, so no need for another layer.
        
         | mixmastamyk wrote:
         | Care to elaborate? Having each tool handle its job sounds like
         | a good strategy.
        
       | hmmokidk wrote:
       | Are we still going to need PgBouncer when there are a large
       | number of connections?
        
         | jpgvm wrote:
         | For now yes. The idle connection changes help but it's still
         | inefficient. I would like to see connection pooling
         | functionality merged into core PG at some point. Eliminate the
         | need for network hop/IPC and enable better back-pressure etc.
        
       | edoceo wrote:
       | Wow! Memory stats! Repeat query stats! The perfect database gets
       | more perfecter! I'm looking forward to using PG for another 20
       | years.
        
       | bredren wrote:
       | If you're interested in recent enthusiastic (nearly effusive)
       | discussion of Postgres and more specifically it's potential as a
       | basis for a data warehouse, you might enjoy this episode of Data
       | Engineering Podcast with Thomas Richter and Joshua Drake:
       | 
       | Episode website:
       | https://www.dataengineeringpodcast.com/postgresql-data-wareh...
       | 
       | Direct: (apple) https://podcasts.apple.com/us/podcast/data-
       | engineering-podca...
        
       | dragonwriter wrote:
       | Lots of good ops-y stuff, and, with my dev hat on, multirange
       | types are just a whole layer of awesome on top of the awesome
       | that range types already were.
        
       | rargulati wrote:
       | What's going to be the vitess of Postgres? Seems to be the "last"
       | missing piece? Or is that not a focus and fit for PG?
        
         | ksec wrote:
         | I think vitess has some long term goal to also support Postgre.
        
         | jpgvm wrote:
         | Vitess for PostgreSQL will probably just be... Vitess.
         | 
         | The concepts behind Vitess are sufficiently general to simply
         | apply them to PostgreSQL now that PostgreSQL has logical
         | replication. In some ways it can be even better due to things
         | like replication slots being a good fit for these sorts of
         | architectures.
         | 
         | The work to port Vitess to PostgreSQL is quite substantial
         | however. Here is a ticket tracking the required tasks at a high
         | level: https://github.com/vitessio/vitess/issues/7084
        
         | qaq wrote:
         | I think something like YugabyteDB
        
         | yed wrote:
         | That would be Citus: https://www.citusdata.com/
        
           | threeseed wrote:
           | Which is now owned by Microsoft so except to see enterprise
           | support disappear.
           | 
           | Instead you are likely to be forced to use a cloud hosted
           | PostgreSQL instance in order to get HA/clustering.
        
       | eikenberry wrote:
       | Any progress on high availability deployments yet? Or does it
       | still rely on problematic, 3rd party tools?
       | 
       | Last time I was responsible for setting up a HA Postgres cluster
       | it was a garbage fire, but that was nearly 10 years ago now. I
       | ask every so often to see if it has improved and each time, so
       | far, the answer has been no.
        
         | rusbus wrote:
         | I found running a 6-node Patroni cluster on Kubernetes to be a
         | surprisingly pain-free experience a couple of years ago
        
         | threeseed wrote:
         | If you want HA use AWS RDS, Azure Citus, GCP Cloud SQL.
         | 
         | Otherwise use MySQL, Oracle, MongoDB, Cassandra etc if you want
         | to run it on your own.
         | 
         | Any other database that invested in a native and supported
         | HA/clustering implementation.
        
           | tluyben2 wrote:
           | Cockroachdb or Yugabyte work well for some cases you might
           | use postgres for.
        
       | jabl wrote:
       | Seems zheap didn't make it this time either?
        
         | pella wrote:
         | ZHEAP Status: https://cybertec-postgresql.github.io/zheap/
         | 
         | - 12-10-2020: "Most regression tests are passing, but write-
         | speeds are still low."
         | 
         | - wiki: https://wiki.postgresql.org/wiki/Zheap
        
         | cett wrote:
         | I would love to see it delivered
        
       | deedubaya wrote:
       | It would be nice to not need pgbouncer
        
         | I_am_tiberius wrote:
         | Indeed! Postgres 14 improves scalability of concurrent
         | connections but I doubt cloud db providers will adjust their
         | max. connections limit.
        
       | Waterluvian wrote:
       | Tangential to this topic:
       | 
       | If I have a Django + PG query that takes 1 second and I want to
       | deeply inspect the breakdown of that entire second, where might I
       | begin reading to learn what tools to use and how?
        
         | purerandomness wrote:
         | I recommend the book "SQL Performance Explained" by Markus
         | Winand: https://sql-performance-explained.com/
         | 
         | It covers all major databases and is a good start to dive into
         | database interna and how to interpret output from query
         | analyzers.
         | 
         | Other than that, I highly recommend joining the mailing list
         | and IRC (#postgresql on libera.chat).
         | 
         | Lots of valuable tricks being shared there by people with
         | decades of experience.
        
           | isatty wrote:
           | Did freenode get renamed?
        
             | jlokier wrote:
             | Not renamed, but a number of Freenode channels and admins
             | moved to libera.chat recently due to non-technical IRC
             | drama / politics over "ownership".
        
         | bredren wrote:
         | Just in case someone's reading this and isn't also aware:
         | Django Debug Toolbar offers somewhat interactive exploration of
         | queries.
         | 
         | It can also be used with Django Rest Framework via the
         | browsable api.
         | 
         | May be parent is looking for deeper insight than this but it is
         | useful to do quick visual query inspection.
        
         | emilsedgh wrote:
         | EXPLAIN (ANALYZE, BUFFERS) <YOUR QUERY>
         | 
         | Take the result of this and paste it into
         | https://explain.depesz.com/
         | 
         | which will make it human readable.
         | 
         | Understanding this is sometimes very easy, but if you want to
         | understand what they _really_ mean, you can read depesz.com
        
           | diminish wrote:
           | I use it frequently - but I wish there was a tool which went
           | into the semantics somewhat.
        
           | jakebasile wrote:
           | Wow, how have I never heard of this tool?! Thanks a lot for
           | the link!
        
         | epberry wrote:
         | This will get you started but is by no means a full guide on
         | query optimization, https://arctype.com/blog/postgresql-query-
         | plan-anatomy/. There's also a fair number of django posts on
         | this blog.
        
         | nerdbaggy wrote:
         | Django has built in explain support which can guide you on the
         | right track
         | https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...
        
         | etxm wrote:
         | I'd start w 'EXPLAIN query', if you arent familiar with the
         | output there, you can put it on PEV and get a visualization.
         | 
         | https://tatiyants.com/pev/#/plans
        
           | snissn wrote:
           | agree!! this page is so helpful
        
         | fabian2k wrote:
         | EXPLAIN ANALYZE in Postgres will give you the query plan,
         | learning to understand that output is very useful to figure out
         | why a query is slow. If the query isn't slow, you can look into
         | Django, but the DB is often a good first guess in these cases.
        
       | wiradikusuma wrote:
       | I'm thinking of using Postgres for a project, but a DBA friend
       | told me operationally it's more challenging than MySQL.
       | Unfortunately, he can't elaborate. Does anyone have real work
       | experience? Or is it based on outdated "PG must manually vacuum
       | frequently"?
        
         | sigzero wrote:
         | MySQL is for that aren't really DBAs and don't want to be (this
         | doesn't mean DBAs don't use it). It's a lot easier to manage.
        
         | CapriciousCptl wrote:
         | You can fiddle with the autovacuum daemon[1,2] but we've never
         | really had to. These days we just run AWS RDS when it counts or
         | a dedicated VPS when it doesn't and things go fine--
         | 
         | [1,2] https://www.postgresql.org/docs/13/routine-vacuuming.html
         | https://www.postgresql.org/docs/current/planner-stats.html
         | 
         | The main issue we get is the 1 connection = 1 process issue
         | although there are ways to mitigate that (namely pgbouncer).
        
         | ComputerGuru wrote:
         | One thing is that Postgres doesn't let you just upgrade to a
         | new major version, as it doesn't update the format of the on-
         | disk binary database files; you must replicate from an existing
         | node or dump/restore. MySQL upgrades the previous version when
         | a new version is installed (which can cause problems, but is
         | certainly "easier").
        
           | andruby wrote:
           | Pg_upgrade [0] is an official part of postgres and does the
           | binary inplace upgrade for you. You should obviously test
           | before running in production, but it has worked perfectly for
           | us when upgrading a 10+TB cluster from pg11 to pg13
           | 
           | [0] https://www.postgresql.org/docs/current/pgupgrade.html
        
             | jeffbee wrote:
             | 10TB is basically zero. At that scale you can choose
             | anything and get away with it.
        
               | tinus_hn wrote:
               | 10 tb is definitely not zero if your project is of the
               | 'should I choose MySQL or Postgres' variety.
        
               | tehbeard wrote:
               | Honestly this just seems like you're waving your e-peen.
               | 
               | Care to give some more details around why and what size
               | of disk usage cause issues for postgres / other db
               | upgrades?
        
               | lilSebastian wrote:
               | A wonderfully constructive comment that keep people using
               | this site, or not.
        
               | darkwater wrote:
               | How can someone with 10k karma write such a statement?
               | Were you trolling? Are you serious? 10TB of relational
               | database can be definetly troublesome to manage and pose
               | challenges to many organizations.
        
             | conradfr wrote:
             | Not totally ideal if you're using containers as it requires
             | the binaries of the old and new version, unless I'm missing
             | something.
        
               | [deleted]
        
               | megous wrote:
               | All distros I've ever used (debian, arch) provide
               | previous and current version of postgresql for the
               | purpose of easy pg_upgrade.
               | 
               | Debian even has special custom helper scripts for doing
               | it. So just build your container with both postgres
               | versions?
        
           | ASalazarMX wrote:
           | I think this is very convenient; you don't want to
           | automatically upgrade a big database because you probably
           | want to choose the downtime window. This is obviously by
           | design, but I'd also like being able to automate the
           | pg_upgradecluster pg_dropcluster process, specially for
           | install-and-forget little databases.
           | 
           | IMO, the biggest shock from the MSSQL/MYSQL to PostgreSQL
           | migration was not having 1 or 2 specific files per database,
           | specially if you used to backup the files instead of doing a
           | formal database backup.
        
           | [deleted]
        
         | offtop5 wrote:
         | I wouldn't imagine postgres is really much harder than MySQL.
         | 
         | Both are a degree more difficult than NoSQL. The main issue is
         | maintaining schema's
        
         | agustif wrote:
         | Are you going to operate it our just rent out some cloud
         | service?
         | 
         | Postgres by itself doesn't have a great horitzontal scaling
         | strategy as of now I think. You need Citus or somt like that on
         | top, maybe your friend was referencing that?
        
         | yannoninator wrote:
         | perhaps your DBA friend was operating PG themselves?
         | 
         | nowadays postgres in the cloud does all of this for you.
        
         | masklinn wrote:
         | I don't know if there are distros which tune it, but the
         | default configuration is usually... conservative.
        
         | unnouinceput wrote:
         | Your DBA friend is stuck in 2000's. Let dinosaurs die and you
         | go with PGSQL because is superior to MySQL on everything.
         | 
         | And don't take my word for it, see for yourself here:
         | 
         | https://en.wikipedia.org/wiki/Comparison_of_relational_datab...
         | 
         | And MySQL is an Oracle product these days, go with MariaDB
         | instead as this one is a MySQL fork made by the original papa
         | of MySQL.
        
           | tfigment wrote:
           | Lacks first class temporal tables. Maybe not important to you
           | and not on that list so do we dismiss that.
        
       | gigatexal wrote:
       | From the article:
       | 
       | And 200+ other improvements in the Postgres 14 release!
       | 
       | These are just some of the many improvements in the new Postgres
       | release. You can find more on what's new in the release notes,
       | such as:                   The new predefined roles
       | pg_read_all_data/pg_write_all_data give global read or write
       | access         Automatic cancellation of long-running queries if
       | the client disconnects         Vacuum now skips index vacuuming
       | when the number of removable index entries is insignificant
       | Per-index information is now included in autovacuum logging
       | output         Partitions can now be detached in a non-blocking
       | manner with ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
       | 
       | the killing of queries when the client disconnects is really nice
       | imo -- the others are great too
        
       | andrewstuart wrote:
       | It would be nice to hear how much of problem XID wraparound is in
       | Postgres 14 - do the fixes below address it entirely or just make
       | it less of a problem?
       | 
       | I see no mention of addressing transaction id wraparound, but
       | these are in the release notes:
       | 
       | Cause vacuum operations to be aggressive if the table is near xid
       | or multixact wraparound (Masahiko Sawada, Peter Geoghegan)
       | 
       | This is controlled by vacuum_failsafe_age and
       | vacuum_multixact_failsafe_age.
       | 
       | Increase warning time and hard limit before transaction id and
       | multi-transaction wraparound (Noah Misch)
       | 
       | This should reduce the possibility of failures that occur without
       | having issued warnings about wraparound.
       | 
       | https://www.postgresql.org/docs/14/release-14.html
        
         | petergeoghegan wrote:
         | Co-author of that feature here.
         | 
         | Clearly it doesn't eliminate the possibility of wraparound
         | failure entirely. Say for example you had a leaked replication
         | slot that blocks cleanup by VACUUM for days or months. It'll
         | also block freezing completely, and so a wraparound failure
         | (where the system won't accept writes) becomes almost
         | inevitable. This is a scenario where the failsafe mechanism
         | won't make any difference at all, since it's just as inevitable
         | (in the absence of DBA intervention).
         | 
         | A more interesting question is how much of a reduction in risk
         | there is if you make certain modest assumptions about the
         | running system, such as assuming that VACUUM can freeze the
         | tuples that need to be frozen to avert wraparound. Then it
         | becomes a question of VACUUM keeping up with the ongoing
         | consumption of XIDs by the system -- the ability of VACUUM to
         | freeze tuples and advance the relfrozenxid for the "oldest"
         | table before XID consumption makes the relfrozenxid dangerously
         | far in the past. It's very hard to model that and make any
         | generalizations, but I believe in practice that the failsafe
         | makes a huge difference, because it stops VACUUM from
         | performing further index vacuuming.
         | 
         | In cases at real risk of wraparound failure, the risk tends to
         | come from the variability in how long index vacuuming takes --
         | index vacuuming has a pretty non-linear cost, whereas all the
         | other overheads are much more linear and therefore much more
         | predictable. Having the ability to just drop those steps if and
         | only if the situation visibly starts to get out of hand is
         | therefore something I expect to be very useful in practice.
         | Though it's hard to prove it.
         | 
         | Long term, the way to fix this is to come up with a design that
         | doesn't need to freeze at all. But that's much harder.
        
       | matthewbauer wrote:
       | Postgres is one of those pieces of software that's so much better
       | than anything else, it's really incredible. I wonder if it's even
       | possible for competitors to catch up at this point - there's not
       | a lot of room for improvement in architecture of relational
       | databases any more. I'm starting to think that Postgres is going
       | to be with us for decades maybe even centuries.
       | 
       | Do any other entrenched software projects come to mind? The only
       | thing comparable I can think of are Git and Linux.
        
         | stickfigure wrote:
         | I'm an enormous fan of Postgres, it's my default go-to RDBMS.
         | But the memory expense of connections is a huge issue and this
         | article doesn't convince me that it's solved.
         | 
         | The machine being used for this benchmark has 96 vCPUs, 192G of
         | RAM, and costs $3k/mo.
         | 
         | My business runs just fine on a 3.75G, 1 vCPU instance. But
         | idle connections eat up a huge amount of RAM and I sometimes
         | find myself hitting the limits when a load spike spins up extra
         | frontend instances.
         | 
         | Sure I could probably setup pgbouncer and some other tools but
         | that's a lot of headache. I'm acutely aware that MySQL (which I
         | dislike because no transactional DDL) does not suffer from this
         | issue. I also don't see this being solved without a major
         | rewrite, which seems unlikely.
         | 
         | So Postgres has at least one very serious fault that makes room
         | in the marketplace. The poor replication story is another.
        
           | pgaddict wrote:
           | It isn't solved, and no one claimed it to be solved. The
           | scalability improvement is related to how we build MVCC
           | snapshots (i.e. information which transactions are visible to
           | a session). That may reduce the memory usage a bit, but it's
           | more about CPU I think.
           | 
           | As for the per-connection memory usage, the big question is
           | whether there really is a problem (and perhaps if there's a
           | reasonable workaround). It's not quite clear to me why you
           | think the issues in your case are are due to idle
           | connections, but OK.
           | 
           | There are two things to consider:
           | 
           | 1) The fixed per-connection memory (tracking state, locks,
           | ..., a couple kBs or so). You'll pay this even for unused
           | connections.
           | 
           | 2) Per-process memory (each connection is handled by a
           | separate thread).
           | 
           | It's difficult to significantly reduce (1) because that state
           | would no matter what the architecture is, mostly. Dealing
           | with (2) would probably require abandoning the current
           | architecture (process per connection) and switching to
           | threads. IMO that's unlikely to happen, because:
           | 
           | (a) the process isolation actually a nice thing from the
           | developer perspective (less locking, fewer data races, ...)
           | 
           | (b) processes work quite fine for reasonable number of long-
           | lived connections, and for connection pools address a lot of
           | the other cases
           | 
           | (c) PostgreSQL supports a lot of platforms, some of which may
           | not may not have very good multi-threading support (and
           | supporting both architectures would be quite a burden)
           | 
           | But that's just my assessment, of course.
        
           | megous wrote:
           | Setting up pgbouncer is not much headache and for for OLTP
           | workloads, it works great. You can even see it in the graph,
           | that best performance is when number of CPU cores = number of
           | connections. And so will be memory use. :)
        
           | foota wrote:
           | Out of curiosity, do you know what causes this?
        
             | agacera wrote:
             | They mention this in the article. But to sum up, each
             | connection in PG is handled by its own OS process. Postgres
             | behind the scenes is composed by multiple single-threaded
             | applications.
             | 
             | This comes with the advantes for Pg developers (and us!)
             | that they don't need to deal with tons of data races
             | issues, but the trade off is that memory wise, a process
             | takes way more memory than a thread.
        
           | btbuilder wrote:
           | I agree - the disparity between the cost of idle connections
           | in Postgres vs MSSQL is hampering our ability to migrate.
        
             | pgaddict wrote:
             | Can you elaborate / quantify the memory requirements a bit?
             | I don't have much experience with MSQQL in this respect, so
             | I'm curious how big the difference is.
        
         | yakubin wrote:
         | Fortran for linear algebra software.
         | 
         | Excel for business spreadsheets.
         | 
         | Java for enterprise server software.
        
           | Supermancho wrote:
           | > Java for enterprise server software.
           | 
           | Big corporations are horribly inefficient and Enterprise
           | Software necessarily so from that...if you're saying Java is
           | terrible by nature of it being the goto for enterprise, then
           | that makes sense. It took 20 years for it to swap places with
           | COBOL and I expect it will be something else in 20 more.
        
             | yakubin wrote:
             | I don't work with Java, but I can think of a few advantages
             | off the top of my head:
             | 
             | - appreciation of backwards-compatibility (here it wins
             | with Python);
             | 
             | - great debuggers and performance tools (e.g. Java Flight
             | Recorder or Eclipse Memory Analyzer);
             | 
             | - easy deployment - you can just give someone a fat JAR
             | (here it wins with all scripting languages, so Python,
             | Ruby, PHP, or any other flavour of the month);
             | 
             | - industry-grade garbage collectors;
             | 
             | - publicly-available standard spec (here it wins with all
             | the defined-by-implementation languages such as Python,
             | PHP, Rust, basically most languages, and with languages
             | which are standardized, but their specs aren't public: C,
             | C++, Ruby);
             | 
             | - kind of like the previous point, but anyway: multiple
             | implementations to choose from;
             | 
             | - I've been told it has good performance. I've never seen a
             | real-world Java application which felt fast, but I've heard
             | people put it at the pedestal and the Debian programming
             | languages benchmarks game seems to corroborate that story;
             | 
             | Besides, the question wasn't about which technologies we
             | like, but which we believe are entrenched so much, they
             | aren't going to go away for a very long time. I don't see
             | Java going away for another 100 years, no matter how much I
             | would or wouldn't like to work with it.
        
               | b9a2cab5 wrote:
               | IMO the Java stdlib also strikes just the right balance
               | between control and abstraction. You can write thread-
               | safe, performant code that makes reasonable tradeoffs
               | between data structures without worrying too much about
               | the details about memory layout and allocation. Said code
               | also is easy to debug even without a debugger because
               | there's almost never undefined behavior caused by use-
               | after-free type bugs and error messages are clear. And
               | the tooling - just IDEs alone, never mind debuggers - is
               | mature and effective.
               | 
               | After using Python, Go, PHP, and C++ it's easy to see why
               | Java is the go-to language for server development.
        
               | radicalbyte wrote:
               | - Fantastic battle tested ecosystem of libraries. -
               | Stable cross platform (kills Python, Node here). - Lingua
               | franca.
               | 
               | Now I personally don't like Java - it feels crusty vs C#
               | - but the libraries are amazing.
               | 
               | You can also use something nice like Kotlin and you have
               | all of the platform benefits with non of the crusty
               | language issues.
        
         | ranit wrote:
         | > Do any other entrenched software projects come to mind?
         | 
         | SQLite.
        
           | IshKebab wrote:
           | I'm pretty hopeful that DuckDB will replace some of the use
           | of SQLite. SQLite is great but it sucks that it's entirely
           | dynamically typed (the types specified for columns are
           | completely ignored).
        
         | dilyevsky wrote:
         | Kubernetes when it comes to clustering.
        
         | eterm wrote:
         | I think anyone who has worked a lot with MSSQL would disagree
         | with Postgres being "so much better". It's only really in the
         | last few years that postgres has pulled ahead, MSSQL was lot
         | more feature rich and performant for a decade.
        
           | harikb wrote:
           | MSSQL ? As in Microsoft SQL Server? I have heard this
           | argument a lot and all the comparisons I have seen are
           | specific benchmarks on specialized hardware. My own personal
           | experience wasn't anything like the benchmarks
        
             | HideousKojima wrote:
             | MSSQL still has a few features that set it apart from
             | Postgres. Off the top of my head are Filestream (basically
             | storing files in the database while still having them
             | accessible as files on the filesystem) and temporal tables
             | without the need for extensions.
             | 
             | Personally if I were choosing the tech stack for my company
             | I'd still go for Postgres though
        
             | sigzero wrote:
             | By "few years" he has to mean 10 to 15 years. ;)
        
         | jeff-davis wrote:
         | I like to say that "Postgres is a great default". It's
         | generally very good, and also very adaptable to special
         | purposes, so it covers a wide range of use cases.
         | 
         | But saying "so much better" is too strong.
        
           | threeseed wrote:
           | It's the inevitable circlejerk we get with every PostgreSQL
           | post on HN.
           | 
           | Which is a shame because it means the legitimate and serious
           | faults (i.e. lack of native HA/clustering) just get waved
           | away.
        
         | sweeneyrod wrote:
         | I think many mercurial users would disagree with you about git.
        
           | jayd16 wrote:
           | Are we talking about market dominance, mind share or the idea
           | that there's no real competition?
           | 
           | MySQL and Oracle exist. Mercurial and perforce exist. I'm not
           | sure it's a terrible stretch to compare git and postures.
        
             | aidenn0 wrote:
             | I think the point is that git _isn 't_ "so much better"
             | than mercurial, while pgsql has had a lead on mysql for
             | quite some time on a lot of technical measurements.
        
               | polskibus wrote:
               | Postgresql does not have real, maintained with each
               | change, clustered index. That itself makes it worse for
               | many workloads than MySQL
        
               | petergeoghegan wrote:
               | I would say that that's pretty dubious claim with modern
               | versions of Postgres and MySQL/InnoDB, running on modern
               | hardware. See for example this recent comparative
               | Benchmark from Mark Callaghan, a well known member of the
               | MySQL community:
               | 
               | https://smalldatum.blogspot.com/2021/01/sysbench-
               | postgres-vs...
               | 
               | I'm not claiming that this benchmark justifies the claim
               | that Postgres broadly performs better than MySQL/InnoDB
               | these days -- that would be highly simplistic. Just as it
               | would be simplistic to claim that MySQL is clearly well
               | ahead with OLTP stuff in some kind of broad and
               | entrenched way. It's highly dependent on workload.
               | 
               | Note that Postgres really comes out ahead on a test
               | called "update-index", which involves updates that modify
               | indexed columns -- the write amplification is much worse
               | on MySQL there. This is precisely the opposite of what
               | most commentators would have predicted. Including (and
               | perhaps even especially) Postgres community people.
        
               | glogla wrote:
               | "Is table a heap with indexes on the side or is table a
               | tree with other indexes on the side (i.e. 'clustered
               | index')" is a more complicated discussion.
               | 
               | The former makes it possible to have MVCC (and thus gives
               | you snapshot isolation and serializability) and makes
               | secondary indexes perform faster, at the cost of vacuum
               | or Oracle-style redo/undo/rollback segments with
               | associated "Snapshot too old" issues.
               | 
               | The latter pretty much forces use of locking even for
               | read so queries block each other (but don't require
               | vacuum or something), makes clustering key selective
               | queries perform faster than secondary index ones and
               | makes you think really hard about the clustering key.
               | 
               | It's not really a feature you would have, but a
               | complicated design tradeoff.
        
         | autodeadmehaha wrote:
         | Anyone whose ever had to upgrade postgres ever knows postgres
         | can't fail fast enough. They must fix their upgrade paths and
         | it's endless means to completely fuck you if they want to be
         | taken seriously.
        
           | yjftsjthsd-h wrote:
           | ? What's wrong with pg_upgrade?
        
         | fibers wrote:
         | i had to roll back to 9.6 on windows because \COPY is
         | fundamentally broken for large cvs
        
           | CapriciousCptl wrote:
           | What's the issue? Just on Windows? Mac OS X with 13.2 has no
           | issue for me with the 1.1gigabyte 20million record csv just
           | imported last week, or some bigger ones I did a few months
           | back.
        
         | dimgl wrote:
         | Postgres is good, even great, but this is hyperbole. Postgres
         | has its downsides, autovacuum being one of them.
        
           | petergeoghegan wrote:
           | Although the article doesn't mention it, index bloat will be
           | far better controlled in Postgres 14:
           | 
           | https://www.postgresql.org/docs/devel/btree-
           | implementation.h...
           | 
           | One benchmark involving a mix of queue-like inserts, updates,
           | and deletes showed that it was practically 100% effective at
           | controlling index bloat:
           | 
           | https://www.postgresql.org/message-
           | id/CAGnEbogATZS1mWMVX8FzZ...
           | 
           | The Postgres 13 baseline for the benchmark/test case
           | (actually HEAD before the patch was committed, but close
           | enough to 13) showed that certain indexes grew by 20% - 60%
           | over several hours. That went down to 0.5% growth over the
           | same period. The index growth much more predictable in that
           | it matches what you'd expect for this workload if you thought
           | about it from first principles. In other words, you'd expect
           | about the same low amount of index growth if you were using a
           | traditional two-phase locking database that doesn't use MVCC
           | at all.
           | 
           | Full disclosure: I am the author of this feature.
        
             | edoceo wrote:
             | Thank you!!
        
         | vosper wrote:
         | > Do any other entrenched software projects come to mind?
         | 
         | Elasticsearch is underrated here, IMO. Yes, there are
         | alternatives for simple fulltext search. But there's a _lot_
         | more it can do (adhoc aggregations incorporating complex
         | fulltext searches, with custom scripted components; geospatial;
         | index lifecycle management) and if you're using those features,
         | there's nothing else comparable.
         | 
         | It's pretty stable, too, once you've got the cluster
         | configured. We don't have outages due to problems with
         | Elasticsearch.
        
           | jeff-davis wrote:
           | I don't know about elasticsearch specifically, but I'm
           | skeptical of special-purpose systems for databases.
           | 
           | They are great in some cases and terrible in others, and over
           | time, use cases push database systems into their worst cases.
           | Use cases rarely stay in the sweet spot of a special-purpose
           | system.
           | 
           | That being said, if the integration is great, and/or the
           | special system is a secondary one (fed from a general-purpose
           | system), then it's often fine.
        
             | vosper wrote:
             | I'm not sure I fully understand your comment (databases
             | that are special-purpose and evolve out of a sweet spot, or
             | special-purpose systems using databases in worst-case
             | ways?).
             | 
             | I certainly wouldn't say ES is the former. We use it for
             | some conplex things that (AFAIK) no other (publicly
             | available; I don't what eg Twitter or Google has going on)
             | system could provide at the scale we need. Everything we're
             | doing is well within the realm of what ES is built for, and
             | it's the only system built for it. It's not perfect, but
             | most of our performance issues could be solved by scaling
             | out, where query or index optimization isn't tractable.
        
               | jeff-davis wrote:
               | I interpreted (misinterpreted?) your comment to be
               | suggesting ES for wider use cases.
        
           | bradleyjg wrote:
           | It's frustrating to need a run-time team for a piece of
           | infrastructure, especially one sold as IaaS.
           | 
           | It's totally understandable that you'd need developers to
           | have expertise in patterns and anti-patterns, as well as
           | needing an expert to set things up in the first place, but
           | you shouldn't have to have a dedicated ES monitoring / tuning
           | / babysitting team like Oracle DBAs of yore. That you do,
           | means it isn't there yet as a product.
        
         | jjeaff wrote:
         | MySQL 8 is not that far behind in feature parity. And is ahead
         | when it comes to scalability. So I don't see postgres as
         | necessarily standing alone.
        
           | ksec wrote:
           | Are there any Roadmap for MySQL 9 ?
        
           | paozac wrote:
           | MySQL's lack of DDL transactions is a serious shortcoming.
        
           | ezekiel68 wrote:
           | You claim that MySQL 8 is ahead when it comes to scalability.
           | What are the bases of this claim? When I see comparisons or
           | entire systems that rely on a database (that is, not micro-
           | benchmarks) such as the TechEmpower web framework benchmarks
           | [0] , I notice that the 'Pg' results cluster near the top,
           | with the "My" results showing up further down the rankings. I
           | understand this isn't version 14 of the former versus version
           | 8 of the latter. But it makes me wonder what the basis of
           | your claims is.
           | 
           | [0] https://www.techempower.com/benchmarks/
        
             | the_duke wrote:
             | Techempower is not a database benchmark. The tests that
             | involve a DB exist to include a DB client in the request
             | flow, not to put any serious load on the database.
        
             | isbvhodnvemrwvn wrote:
             | Aren't those run on a single node DB server? And the
             | queries don't really seem realistic at all, e.g. single
             | query test fetches 1 out of 10 000 rows, with no joins at
             | all. Fortunes fetches 1 out of 10 rows. This seems
             | extremely trivial.
        
             | merb wrote:
             | well if you need more than one server, mysql has vitess,
             | which is huge. postgres has citus, but that is way more
             | complex to setup than vitess.
             | 
             | I still would never use mysql, just because of vitess.
        
               | [deleted]
        
           | purerandomness wrote:
           | No DDL transactions, no materialized views, the list is
           | endless.
           | 
           | There's almost no reason to pick MySQL for a new project.
        
             | tfigment wrote:
             | MySQL and mariadb have first class temporal tables. Pg has
             | compile requirement and so cannot use in AWS RDS.
        
               | lowercased wrote:
               | I was aware maria had temporary tables, but not mysql
               | proper. Any links you can point me to? Every search is
               | coming up with 'temporary' table info, not temporal.
        
               | lowercased wrote:
               | mysql8 has gis/spatial stuff built in now. may not quite
               | be on par with postgis, but... i also don't have to futz
               | with "doesn't come baked in". Dealt with someone who
               | wrote a whole bunch of lat/lon/spatial stuff in client
               | code because we're on postgres but ... he couldn't get
               | postgis installed (then even if he could, figuring out
               | how to convince the ops people to add a new 'thing' in
               | production would have been a delay).
               | 
               | having stuff baked in is often a win.
        
               | dragonwriter wrote:
               | > MySQL and mariadb have first class temporal tables. Pg
               | has compile requirement and so cannot use in AWS RDS.
               | 
               | There's a pl/pgsql reimplementation of temporal tables
               | specifically for that use case.
        
               | phonon wrote:
               | https://news.ycombinator.com/item?id=26768220
        
         | jasonwatkinspdx wrote:
         | There's a ton of room for improvement in the architecture of
         | relational databases. This isn't a dig against Postgres, or
         | ignoring how difficult it will be to get a new system to the
         | same level of maturity. But databases designed natively for
         | cloud/clustering, SSDs, (pmem soon perhaps), etc are quite a
         | bit different. There's enormous simplifications and performance
         | gains possible.
         | 
         | There's been a lot of exciting work in this area over the last
         | decade or so. Andy Pavlo's classes are great surveys of the
         | latest work: https://15721.courses.cs.cmu.edu/spring2020/
         | 
         | CosmosDB is an example of a relational (multi paradigm
         | properly) database with a quite different architecture vs the
         | classic design, that's moved into production status quite
         | rapidly.
         | 
         | FaunaDB and CockroachDB are moving with solid momentum too.
        
           | oblio wrote:
           | Yeah, to list a bit:
           | 
           | - scaling is non-trivial (you can't just add a node and have
           | PostgreSQL automagically Do The Right Thing(tm))
           | 
           | - you can only have so many connections open to the database,
           | causing issues with things such as AWS Lambda
           | 
           | - I don't remember if this was changed, but I got the
           | impression a while ago that having dynamic DB users was a bit
           | cumbersome to set up (plug PostgreSQL to AD/LDAP)
        
           | threeseed wrote:
           | There are also technologies like NVMe over Fabric/RDMA, eBPF,
           | XDP, io_uring etc which are just starting to get traction and
           | are game changers for performance. None of which are being
           | used.
           | 
           | All of these require a different architecture so expect to
           | see newer databases push things even further.
        
           | exceptione wrote:
           | You must be kidding me with the CosmosDB mention. It doesn't
           | even have foreign key constraints. I have to work with it and
           | I have never seen such a feature-poor dbms before.
        
           | gogopuppygogo wrote:
           | Cockroach is the worst brand for a database ever.
           | 
           | Even Croach would be a massive branding improvement.
           | 
           | This is similar to how gimp is a terrible brand.
        
             | meesterdude wrote:
             | I mean... the WORST? For me Mongo takes the cake, but
             | oracle is up there too.
        
               | zdragnar wrote:
               | Really? Oracle actually makes a lot of sense to me for a
               | database name (in the 'source of truth' sense, not in the
               | prophet sense).
               | 
               | Mongo, on the other hand, has definitely always had the
               | racist/ablist slur as the first connotation for me.
        
               | whatshisface wrote:
               | I've learned almost all the slurs I know from comments or
               | media sources complaining about them. It's the only place
               | they're used in polite society.
        
       | efxhoy wrote:
       | > Automatic cancellation of long-running queries if the client
       | disconnects
       | 
       | Sweet! I often screw up a query and need to cancel it with
       | pg_cancel_backend(pid)
       | 
       | because Ctrl-C rarely works. With this I can just ragequit and
       | reconnect. Sweet!
        
         | znep wrote:
         | I agree this is a great addition, but FWIW it isn't normal for
         | ^C to not work in psql. Perhaps you are using some other client
         | that doesn't support aborting queries properly, or have
         | something on the network between you and the server behaving
         | poorly and dropping connections?
        
       | e1g wrote:
       | Another exciting feature in PG14 is the new JSONB syntax[0],
       | which makes it easy to update deep JSON values -
       | UPDATE table SET some_jsonb_column['person']['bio']['age'] =
       | '99';
       | 
       | [0] https://erthalion.info/2021/03/03/subscripting/
        
         | xfalcox wrote:
         | Wow is this for real? That is such a big quality of life
         | change! Happy to see it!
        
           | megous wrote:
           | Not much different from some_jsonb#>>'{some,path}' and once
           | you add the need to convert out of jsonb to text, you'll not
           | be saving any characters either. At least for queries.
           | 
           | For updates, it looks nice I guess.
        
             | megous wrote:
             | Downvoters could instead provide a way to get a decoded
             | text of a property with this new syntax, like it's possible
             | with #>>. That would be more useful.
        
             | zdragnar wrote:
             | I think the difference is _familiarity_.
             | 
             | It shouldn't matter so much, but when you don't use one
             | language as much as you do other languages, it becomes that
             | much harder to remember unfamiliar syntaxes and grammars,
             | and easier to confuse similar looking operations with each
             | other.
        
               | megous wrote:
               | In that case this does not help. SELECT json['a']; will
               | not return the value of the string in {"a":"ble"} (like
               | it does in Javascript), but a JSON encoding of that
               | string, so '"ble"'. You'll still not be able to do simple
               | comparisons like `SELECT json_col['a'] = some_text_col;`
               | Superficial familiarity, but it still behaves differently
               | than you expect.
               | 
               | Is there even a function that would convert JSON encoded
               | "string" to text it represents in postgresql? I didn't
               | find it.
               | 
               | So all you can do is `SELECT json_col['a'] =
               | some_text_col::jsonb;` and hope for the best (that string
               | encodings will match) or use the old syntax with ->> or
               | #>>.
        
         | GordonS wrote:
         | Gods, but this is fabulous!
         | 
         | JSONB capabilities in Postgres are amazing, but the syntax is
         | really annoying - for example, I'm forever mixing up `->` and
         | `->>`. This new syntax feels _far_ more intuitive.
        
           | topicseed wrote:
           | Constantly have to google up the JSONB wuery syntax, it's
           | just too confusing to me, although it is indeed powerful.
           | 
           | Glad to this this super intuitive and familiar syntax added.
           | Will make writing these updates a lot lot lot easier. Not
           | even close!
        
       ___________________________________________________________________
       (page generated 2021-05-22 23:00 UTC)