[HN Gopher] An early look at Postgres 14: Performance and monito...
___________________________________________________________________
An early look at Postgres 14: Performance and monitoring
Improvements
Author : bananaoomarang
Score : 274 points
Date : 2021-05-22 15:47 UTC (7 hours ago)
(HTM) web link (pganalyze.com)
(TXT) w3m dump (pganalyze.com)
| lmarcos wrote:
| All I want is to be able to use Postgres in production without
| the need of pgbouncer.
| matsemann wrote:
| Never had the use for it or even heard of it, guess it depends
| on usage patterns? I've mostly worked with longlived java
| servers, and there having an internal db pool has been standard
| since forever, so no need for another layer.
| mixmastamyk wrote:
| Care to elaborate? Having each tool handle its job sounds like
| a good strategy.
| hmmokidk wrote:
| Are we still going to need PgBouncer when there are a large
| number of connections?
| jpgvm wrote:
| For now yes. The idle connection changes help but it's still
| inefficient. I would like to see connection pooling
| functionality merged into core PG at some point. Eliminate the
| need for network hop/IPC and enable better back-pressure etc.
| edoceo wrote:
| Wow! Memory stats! Repeat query stats! The perfect database gets
| more perfecter! I'm looking forward to using PG for another 20
| years.
| bredren wrote:
| If you're interested in recent enthusiastic (nearly effusive)
| discussion of Postgres and more specifically it's potential as a
| basis for a data warehouse, you might enjoy this episode of Data
| Engineering Podcast with Thomas Richter and Joshua Drake:
|
| Episode website:
| https://www.dataengineeringpodcast.com/postgresql-data-wareh...
|
| Direct: (apple) https://podcasts.apple.com/us/podcast/data-
| engineering-podca...
| dragonwriter wrote:
| Lots of good ops-y stuff, and, with my dev hat on, multirange
| types are just a whole layer of awesome on top of the awesome
| that range types already were.
| rargulati wrote:
| What's going to be the vitess of Postgres? Seems to be the "last"
| missing piece? Or is that not a focus and fit for PG?
| ksec wrote:
| I think vitess has some long term goal to also support Postgre.
| jpgvm wrote:
| Vitess for PostgreSQL will probably just be... Vitess.
|
| The concepts behind Vitess are sufficiently general to simply
| apply them to PostgreSQL now that PostgreSQL has logical
| replication. In some ways it can be even better due to things
| like replication slots being a good fit for these sorts of
| architectures.
|
| The work to port Vitess to PostgreSQL is quite substantial
| however. Here is a ticket tracking the required tasks at a high
| level: https://github.com/vitessio/vitess/issues/7084
| qaq wrote:
| I think something like YugabyteDB
| yed wrote:
| That would be Citus: https://www.citusdata.com/
| threeseed wrote:
| Which is now owned by Microsoft so except to see enterprise
| support disappear.
|
| Instead you are likely to be forced to use a cloud hosted
| PostgreSQL instance in order to get HA/clustering.
| eikenberry wrote:
| Any progress on high availability deployments yet? Or does it
| still rely on problematic, 3rd party tools?
|
| Last time I was responsible for setting up a HA Postgres cluster
| it was a garbage fire, but that was nearly 10 years ago now. I
| ask every so often to see if it has improved and each time, so
| far, the answer has been no.
| rusbus wrote:
| I found running a 6-node Patroni cluster on Kubernetes to be a
| surprisingly pain-free experience a couple of years ago
| threeseed wrote:
| If you want HA use AWS RDS, Azure Citus, GCP Cloud SQL.
|
| Otherwise use MySQL, Oracle, MongoDB, Cassandra etc if you want
| to run it on your own.
|
| Any other database that invested in a native and supported
| HA/clustering implementation.
| tluyben2 wrote:
| Cockroachdb or Yugabyte work well for some cases you might
| use postgres for.
| jabl wrote:
| Seems zheap didn't make it this time either?
| pella wrote:
| ZHEAP Status: https://cybertec-postgresql.github.io/zheap/
|
| - 12-10-2020: "Most regression tests are passing, but write-
| speeds are still low."
|
| - wiki: https://wiki.postgresql.org/wiki/Zheap
| cett wrote:
| I would love to see it delivered
| deedubaya wrote:
| It would be nice to not need pgbouncer
| I_am_tiberius wrote:
| Indeed! Postgres 14 improves scalability of concurrent
| connections but I doubt cloud db providers will adjust their
| max. connections limit.
| Waterluvian wrote:
| Tangential to this topic:
|
| If I have a Django + PG query that takes 1 second and I want to
| deeply inspect the breakdown of that entire second, where might I
| begin reading to learn what tools to use and how?
| purerandomness wrote:
| I recommend the book "SQL Performance Explained" by Markus
| Winand: https://sql-performance-explained.com/
|
| It covers all major databases and is a good start to dive into
| database interna and how to interpret output from query
| analyzers.
|
| Other than that, I highly recommend joining the mailing list
| and IRC (#postgresql on libera.chat).
|
| Lots of valuable tricks being shared there by people with
| decades of experience.
| isatty wrote:
| Did freenode get renamed?
| jlokier wrote:
| Not renamed, but a number of Freenode channels and admins
| moved to libera.chat recently due to non-technical IRC
| drama / politics over "ownership".
| bredren wrote:
| Just in case someone's reading this and isn't also aware:
| Django Debug Toolbar offers somewhat interactive exploration of
| queries.
|
| It can also be used with Django Rest Framework via the
| browsable api.
|
| May be parent is looking for deeper insight than this but it is
| useful to do quick visual query inspection.
| emilsedgh wrote:
| EXPLAIN (ANALYZE, BUFFERS) <YOUR QUERY>
|
| Take the result of this and paste it into
| https://explain.depesz.com/
|
| which will make it human readable.
|
| Understanding this is sometimes very easy, but if you want to
| understand what they _really_ mean, you can read depesz.com
| diminish wrote:
| I use it frequently - but I wish there was a tool which went
| into the semantics somewhat.
| jakebasile wrote:
| Wow, how have I never heard of this tool?! Thanks a lot for
| the link!
| epberry wrote:
| This will get you started but is by no means a full guide on
| query optimization, https://arctype.com/blog/postgresql-query-
| plan-anatomy/. There's also a fair number of django posts on
| this blog.
| nerdbaggy wrote:
| Django has built in explain support which can guide you on the
| right track
| https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...
| etxm wrote:
| I'd start w 'EXPLAIN query', if you arent familiar with the
| output there, you can put it on PEV and get a visualization.
|
| https://tatiyants.com/pev/#/plans
| snissn wrote:
| agree!! this page is so helpful
| fabian2k wrote:
| EXPLAIN ANALYZE in Postgres will give you the query plan,
| learning to understand that output is very useful to figure out
| why a query is slow. If the query isn't slow, you can look into
| Django, but the DB is often a good first guess in these cases.
| wiradikusuma wrote:
| I'm thinking of using Postgres for a project, but a DBA friend
| told me operationally it's more challenging than MySQL.
| Unfortunately, he can't elaborate. Does anyone have real work
| experience? Or is it based on outdated "PG must manually vacuum
| frequently"?
| sigzero wrote:
| MySQL is for that aren't really DBAs and don't want to be (this
| doesn't mean DBAs don't use it). It's a lot easier to manage.
| CapriciousCptl wrote:
| You can fiddle with the autovacuum daemon[1,2] but we've never
| really had to. These days we just run AWS RDS when it counts or
| a dedicated VPS when it doesn't and things go fine--
|
| [1,2] https://www.postgresql.org/docs/13/routine-vacuuming.html
| https://www.postgresql.org/docs/current/planner-stats.html
|
| The main issue we get is the 1 connection = 1 process issue
| although there are ways to mitigate that (namely pgbouncer).
| ComputerGuru wrote:
| One thing is that Postgres doesn't let you just upgrade to a
| new major version, as it doesn't update the format of the on-
| disk binary database files; you must replicate from an existing
| node or dump/restore. MySQL upgrades the previous version when
| a new version is installed (which can cause problems, but is
| certainly "easier").
| andruby wrote:
| Pg_upgrade [0] is an official part of postgres and does the
| binary inplace upgrade for you. You should obviously test
| before running in production, but it has worked perfectly for
| us when upgrading a 10+TB cluster from pg11 to pg13
|
| [0] https://www.postgresql.org/docs/current/pgupgrade.html
| jeffbee wrote:
| 10TB is basically zero. At that scale you can choose
| anything and get away with it.
| tinus_hn wrote:
| 10 tb is definitely not zero if your project is of the
| 'should I choose MySQL or Postgres' variety.
| tehbeard wrote:
| Honestly this just seems like you're waving your e-peen.
|
| Care to give some more details around why and what size
| of disk usage cause issues for postgres / other db
| upgrades?
| lilSebastian wrote:
| A wonderfully constructive comment that keep people using
| this site, or not.
| darkwater wrote:
| How can someone with 10k karma write such a statement?
| Were you trolling? Are you serious? 10TB of relational
| database can be definetly troublesome to manage and pose
| challenges to many organizations.
| conradfr wrote:
| Not totally ideal if you're using containers as it requires
| the binaries of the old and new version, unless I'm missing
| something.
| [deleted]
| megous wrote:
| All distros I've ever used (debian, arch) provide
| previous and current version of postgresql for the
| purpose of easy pg_upgrade.
|
| Debian even has special custom helper scripts for doing
| it. So just build your container with both postgres
| versions?
| ASalazarMX wrote:
| I think this is very convenient; you don't want to
| automatically upgrade a big database because you probably
| want to choose the downtime window. This is obviously by
| design, but I'd also like being able to automate the
| pg_upgradecluster pg_dropcluster process, specially for
| install-and-forget little databases.
|
| IMO, the biggest shock from the MSSQL/MYSQL to PostgreSQL
| migration was not having 1 or 2 specific files per database,
| specially if you used to backup the files instead of doing a
| formal database backup.
| [deleted]
| offtop5 wrote:
| I wouldn't imagine postgres is really much harder than MySQL.
|
| Both are a degree more difficult than NoSQL. The main issue is
| maintaining schema's
| agustif wrote:
| Are you going to operate it our just rent out some cloud
| service?
|
| Postgres by itself doesn't have a great horitzontal scaling
| strategy as of now I think. You need Citus or somt like that on
| top, maybe your friend was referencing that?
| yannoninator wrote:
| perhaps your DBA friend was operating PG themselves?
|
| nowadays postgres in the cloud does all of this for you.
| masklinn wrote:
| I don't know if there are distros which tune it, but the
| default configuration is usually... conservative.
| unnouinceput wrote:
| Your DBA friend is stuck in 2000's. Let dinosaurs die and you
| go with PGSQL because is superior to MySQL on everything.
|
| And don't take my word for it, see for yourself here:
|
| https://en.wikipedia.org/wiki/Comparison_of_relational_datab...
|
| And MySQL is an Oracle product these days, go with MariaDB
| instead as this one is a MySQL fork made by the original papa
| of MySQL.
| tfigment wrote:
| Lacks first class temporal tables. Maybe not important to you
| and not on that list so do we dismiss that.
| gigatexal wrote:
| From the article:
|
| And 200+ other improvements in the Postgres 14 release!
|
| These are just some of the many improvements in the new Postgres
| release. You can find more on what's new in the release notes,
| such as: The new predefined roles
| pg_read_all_data/pg_write_all_data give global read or write
| access Automatic cancellation of long-running queries if
| the client disconnects Vacuum now skips index vacuuming
| when the number of removable index entries is insignificant
| Per-index information is now included in autovacuum logging
| output Partitions can now be detached in a non-blocking
| manner with ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY
|
| the killing of queries when the client disconnects is really nice
| imo -- the others are great too
| andrewstuart wrote:
| It would be nice to hear how much of problem XID wraparound is in
| Postgres 14 - do the fixes below address it entirely or just make
| it less of a problem?
|
| I see no mention of addressing transaction id wraparound, but
| these are in the release notes:
|
| Cause vacuum operations to be aggressive if the table is near xid
| or multixact wraparound (Masahiko Sawada, Peter Geoghegan)
|
| This is controlled by vacuum_failsafe_age and
| vacuum_multixact_failsafe_age.
|
| Increase warning time and hard limit before transaction id and
| multi-transaction wraparound (Noah Misch)
|
| This should reduce the possibility of failures that occur without
| having issued warnings about wraparound.
|
| https://www.postgresql.org/docs/14/release-14.html
| petergeoghegan wrote:
| Co-author of that feature here.
|
| Clearly it doesn't eliminate the possibility of wraparound
| failure entirely. Say for example you had a leaked replication
| slot that blocks cleanup by VACUUM for days or months. It'll
| also block freezing completely, and so a wraparound failure
| (where the system won't accept writes) becomes almost
| inevitable. This is a scenario where the failsafe mechanism
| won't make any difference at all, since it's just as inevitable
| (in the absence of DBA intervention).
|
| A more interesting question is how much of a reduction in risk
| there is if you make certain modest assumptions about the
| running system, such as assuming that VACUUM can freeze the
| tuples that need to be frozen to avert wraparound. Then it
| becomes a question of VACUUM keeping up with the ongoing
| consumption of XIDs by the system -- the ability of VACUUM to
| freeze tuples and advance the relfrozenxid for the "oldest"
| table before XID consumption makes the relfrozenxid dangerously
| far in the past. It's very hard to model that and make any
| generalizations, but I believe in practice that the failsafe
| makes a huge difference, because it stops VACUUM from
| performing further index vacuuming.
|
| In cases at real risk of wraparound failure, the risk tends to
| come from the variability in how long index vacuuming takes --
| index vacuuming has a pretty non-linear cost, whereas all the
| other overheads are much more linear and therefore much more
| predictable. Having the ability to just drop those steps if and
| only if the situation visibly starts to get out of hand is
| therefore something I expect to be very useful in practice.
| Though it's hard to prove it.
|
| Long term, the way to fix this is to come up with a design that
| doesn't need to freeze at all. But that's much harder.
| matthewbauer wrote:
| Postgres is one of those pieces of software that's so much better
| than anything else, it's really incredible. I wonder if it's even
| possible for competitors to catch up at this point - there's not
| a lot of room for improvement in architecture of relational
| databases any more. I'm starting to think that Postgres is going
| to be with us for decades maybe even centuries.
|
| Do any other entrenched software projects come to mind? The only
| thing comparable I can think of are Git and Linux.
| stickfigure wrote:
| I'm an enormous fan of Postgres, it's my default go-to RDBMS.
| But the memory expense of connections is a huge issue and this
| article doesn't convince me that it's solved.
|
| The machine being used for this benchmark has 96 vCPUs, 192G of
| RAM, and costs $3k/mo.
|
| My business runs just fine on a 3.75G, 1 vCPU instance. But
| idle connections eat up a huge amount of RAM and I sometimes
| find myself hitting the limits when a load spike spins up extra
| frontend instances.
|
| Sure I could probably setup pgbouncer and some other tools but
| that's a lot of headache. I'm acutely aware that MySQL (which I
| dislike because no transactional DDL) does not suffer from this
| issue. I also don't see this being solved without a major
| rewrite, which seems unlikely.
|
| So Postgres has at least one very serious fault that makes room
| in the marketplace. The poor replication story is another.
| pgaddict wrote:
| It isn't solved, and no one claimed it to be solved. The
| scalability improvement is related to how we build MVCC
| snapshots (i.e. information which transactions are visible to
| a session). That may reduce the memory usage a bit, but it's
| more about CPU I think.
|
| As for the per-connection memory usage, the big question is
| whether there really is a problem (and perhaps if there's a
| reasonable workaround). It's not quite clear to me why you
| think the issues in your case are are due to idle
| connections, but OK.
|
| There are two things to consider:
|
| 1) The fixed per-connection memory (tracking state, locks,
| ..., a couple kBs or so). You'll pay this even for unused
| connections.
|
| 2) Per-process memory (each connection is handled by a
| separate thread).
|
| It's difficult to significantly reduce (1) because that state
| would no matter what the architecture is, mostly. Dealing
| with (2) would probably require abandoning the current
| architecture (process per connection) and switching to
| threads. IMO that's unlikely to happen, because:
|
| (a) the process isolation actually a nice thing from the
| developer perspective (less locking, fewer data races, ...)
|
| (b) processes work quite fine for reasonable number of long-
| lived connections, and for connection pools address a lot of
| the other cases
|
| (c) PostgreSQL supports a lot of platforms, some of which may
| not may not have very good multi-threading support (and
| supporting both architectures would be quite a burden)
|
| But that's just my assessment, of course.
| megous wrote:
| Setting up pgbouncer is not much headache and for for OLTP
| workloads, it works great. You can even see it in the graph,
| that best performance is when number of CPU cores = number of
| connections. And so will be memory use. :)
| foota wrote:
| Out of curiosity, do you know what causes this?
| agacera wrote:
| They mention this in the article. But to sum up, each
| connection in PG is handled by its own OS process. Postgres
| behind the scenes is composed by multiple single-threaded
| applications.
|
| This comes with the advantes for Pg developers (and us!)
| that they don't need to deal with tons of data races
| issues, but the trade off is that memory wise, a process
| takes way more memory than a thread.
| btbuilder wrote:
| I agree - the disparity between the cost of idle connections
| in Postgres vs MSSQL is hampering our ability to migrate.
| pgaddict wrote:
| Can you elaborate / quantify the memory requirements a bit?
| I don't have much experience with MSQQL in this respect, so
| I'm curious how big the difference is.
| yakubin wrote:
| Fortran for linear algebra software.
|
| Excel for business spreadsheets.
|
| Java for enterprise server software.
| Supermancho wrote:
| > Java for enterprise server software.
|
| Big corporations are horribly inefficient and Enterprise
| Software necessarily so from that...if you're saying Java is
| terrible by nature of it being the goto for enterprise, then
| that makes sense. It took 20 years for it to swap places with
| COBOL and I expect it will be something else in 20 more.
| yakubin wrote:
| I don't work with Java, but I can think of a few advantages
| off the top of my head:
|
| - appreciation of backwards-compatibility (here it wins
| with Python);
|
| - great debuggers and performance tools (e.g. Java Flight
| Recorder or Eclipse Memory Analyzer);
|
| - easy deployment - you can just give someone a fat JAR
| (here it wins with all scripting languages, so Python,
| Ruby, PHP, or any other flavour of the month);
|
| - industry-grade garbage collectors;
|
| - publicly-available standard spec (here it wins with all
| the defined-by-implementation languages such as Python,
| PHP, Rust, basically most languages, and with languages
| which are standardized, but their specs aren't public: C,
| C++, Ruby);
|
| - kind of like the previous point, but anyway: multiple
| implementations to choose from;
|
| - I've been told it has good performance. I've never seen a
| real-world Java application which felt fast, but I've heard
| people put it at the pedestal and the Debian programming
| languages benchmarks game seems to corroborate that story;
|
| Besides, the question wasn't about which technologies we
| like, but which we believe are entrenched so much, they
| aren't going to go away for a very long time. I don't see
| Java going away for another 100 years, no matter how much I
| would or wouldn't like to work with it.
| b9a2cab5 wrote:
| IMO the Java stdlib also strikes just the right balance
| between control and abstraction. You can write thread-
| safe, performant code that makes reasonable tradeoffs
| between data structures without worrying too much about
| the details about memory layout and allocation. Said code
| also is easy to debug even without a debugger because
| there's almost never undefined behavior caused by use-
| after-free type bugs and error messages are clear. And
| the tooling - just IDEs alone, never mind debuggers - is
| mature and effective.
|
| After using Python, Go, PHP, and C++ it's easy to see why
| Java is the go-to language for server development.
| radicalbyte wrote:
| - Fantastic battle tested ecosystem of libraries. -
| Stable cross platform (kills Python, Node here). - Lingua
| franca.
|
| Now I personally don't like Java - it feels crusty vs C#
| - but the libraries are amazing.
|
| You can also use something nice like Kotlin and you have
| all of the platform benefits with non of the crusty
| language issues.
| ranit wrote:
| > Do any other entrenched software projects come to mind?
|
| SQLite.
| IshKebab wrote:
| I'm pretty hopeful that DuckDB will replace some of the use
| of SQLite. SQLite is great but it sucks that it's entirely
| dynamically typed (the types specified for columns are
| completely ignored).
| dilyevsky wrote:
| Kubernetes when it comes to clustering.
| eterm wrote:
| I think anyone who has worked a lot with MSSQL would disagree
| with Postgres being "so much better". It's only really in the
| last few years that postgres has pulled ahead, MSSQL was lot
| more feature rich and performant for a decade.
| harikb wrote:
| MSSQL ? As in Microsoft SQL Server? I have heard this
| argument a lot and all the comparisons I have seen are
| specific benchmarks on specialized hardware. My own personal
| experience wasn't anything like the benchmarks
| HideousKojima wrote:
| MSSQL still has a few features that set it apart from
| Postgres. Off the top of my head are Filestream (basically
| storing files in the database while still having them
| accessible as files on the filesystem) and temporal tables
| without the need for extensions.
|
| Personally if I were choosing the tech stack for my company
| I'd still go for Postgres though
| sigzero wrote:
| By "few years" he has to mean 10 to 15 years. ;)
| jeff-davis wrote:
| I like to say that "Postgres is a great default". It's
| generally very good, and also very adaptable to special
| purposes, so it covers a wide range of use cases.
|
| But saying "so much better" is too strong.
| threeseed wrote:
| It's the inevitable circlejerk we get with every PostgreSQL
| post on HN.
|
| Which is a shame because it means the legitimate and serious
| faults (i.e. lack of native HA/clustering) just get waved
| away.
| sweeneyrod wrote:
| I think many mercurial users would disagree with you about git.
| jayd16 wrote:
| Are we talking about market dominance, mind share or the idea
| that there's no real competition?
|
| MySQL and Oracle exist. Mercurial and perforce exist. I'm not
| sure it's a terrible stretch to compare git and postures.
| aidenn0 wrote:
| I think the point is that git _isn 't_ "so much better"
| than mercurial, while pgsql has had a lead on mysql for
| quite some time on a lot of technical measurements.
| polskibus wrote:
| Postgresql does not have real, maintained with each
| change, clustered index. That itself makes it worse for
| many workloads than MySQL
| petergeoghegan wrote:
| I would say that that's pretty dubious claim with modern
| versions of Postgres and MySQL/InnoDB, running on modern
| hardware. See for example this recent comparative
| Benchmark from Mark Callaghan, a well known member of the
| MySQL community:
|
| https://smalldatum.blogspot.com/2021/01/sysbench-
| postgres-vs...
|
| I'm not claiming that this benchmark justifies the claim
| that Postgres broadly performs better than MySQL/InnoDB
| these days -- that would be highly simplistic. Just as it
| would be simplistic to claim that MySQL is clearly well
| ahead with OLTP stuff in some kind of broad and
| entrenched way. It's highly dependent on workload.
|
| Note that Postgres really comes out ahead on a test
| called "update-index", which involves updates that modify
| indexed columns -- the write amplification is much worse
| on MySQL there. This is precisely the opposite of what
| most commentators would have predicted. Including (and
| perhaps even especially) Postgres community people.
| glogla wrote:
| "Is table a heap with indexes on the side or is table a
| tree with other indexes on the side (i.e. 'clustered
| index')" is a more complicated discussion.
|
| The former makes it possible to have MVCC (and thus gives
| you snapshot isolation and serializability) and makes
| secondary indexes perform faster, at the cost of vacuum
| or Oracle-style redo/undo/rollback segments with
| associated "Snapshot too old" issues.
|
| The latter pretty much forces use of locking even for
| read so queries block each other (but don't require
| vacuum or something), makes clustering key selective
| queries perform faster than secondary index ones and
| makes you think really hard about the clustering key.
|
| It's not really a feature you would have, but a
| complicated design tradeoff.
| autodeadmehaha wrote:
| Anyone whose ever had to upgrade postgres ever knows postgres
| can't fail fast enough. They must fix their upgrade paths and
| it's endless means to completely fuck you if they want to be
| taken seriously.
| yjftsjthsd-h wrote:
| ? What's wrong with pg_upgrade?
| fibers wrote:
| i had to roll back to 9.6 on windows because \COPY is
| fundamentally broken for large cvs
| CapriciousCptl wrote:
| What's the issue? Just on Windows? Mac OS X with 13.2 has no
| issue for me with the 1.1gigabyte 20million record csv just
| imported last week, or some bigger ones I did a few months
| back.
| dimgl wrote:
| Postgres is good, even great, but this is hyperbole. Postgres
| has its downsides, autovacuum being one of them.
| petergeoghegan wrote:
| Although the article doesn't mention it, index bloat will be
| far better controlled in Postgres 14:
|
| https://www.postgresql.org/docs/devel/btree-
| implementation.h...
|
| One benchmark involving a mix of queue-like inserts, updates,
| and deletes showed that it was practically 100% effective at
| controlling index bloat:
|
| https://www.postgresql.org/message-
| id/CAGnEbogATZS1mWMVX8FzZ...
|
| The Postgres 13 baseline for the benchmark/test case
| (actually HEAD before the patch was committed, but close
| enough to 13) showed that certain indexes grew by 20% - 60%
| over several hours. That went down to 0.5% growth over the
| same period. The index growth much more predictable in that
| it matches what you'd expect for this workload if you thought
| about it from first principles. In other words, you'd expect
| about the same low amount of index growth if you were using a
| traditional two-phase locking database that doesn't use MVCC
| at all.
|
| Full disclosure: I am the author of this feature.
| edoceo wrote:
| Thank you!!
| vosper wrote:
| > Do any other entrenched software projects come to mind?
|
| Elasticsearch is underrated here, IMO. Yes, there are
| alternatives for simple fulltext search. But there's a _lot_
| more it can do (adhoc aggregations incorporating complex
| fulltext searches, with custom scripted components; geospatial;
| index lifecycle management) and if you're using those features,
| there's nothing else comparable.
|
| It's pretty stable, too, once you've got the cluster
| configured. We don't have outages due to problems with
| Elasticsearch.
| jeff-davis wrote:
| I don't know about elasticsearch specifically, but I'm
| skeptical of special-purpose systems for databases.
|
| They are great in some cases and terrible in others, and over
| time, use cases push database systems into their worst cases.
| Use cases rarely stay in the sweet spot of a special-purpose
| system.
|
| That being said, if the integration is great, and/or the
| special system is a secondary one (fed from a general-purpose
| system), then it's often fine.
| vosper wrote:
| I'm not sure I fully understand your comment (databases
| that are special-purpose and evolve out of a sweet spot, or
| special-purpose systems using databases in worst-case
| ways?).
|
| I certainly wouldn't say ES is the former. We use it for
| some conplex things that (AFAIK) no other (publicly
| available; I don't what eg Twitter or Google has going on)
| system could provide at the scale we need. Everything we're
| doing is well within the realm of what ES is built for, and
| it's the only system built for it. It's not perfect, but
| most of our performance issues could be solved by scaling
| out, where query or index optimization isn't tractable.
| jeff-davis wrote:
| I interpreted (misinterpreted?) your comment to be
| suggesting ES for wider use cases.
| bradleyjg wrote:
| It's frustrating to need a run-time team for a piece of
| infrastructure, especially one sold as IaaS.
|
| It's totally understandable that you'd need developers to
| have expertise in patterns and anti-patterns, as well as
| needing an expert to set things up in the first place, but
| you shouldn't have to have a dedicated ES monitoring / tuning
| / babysitting team like Oracle DBAs of yore. That you do,
| means it isn't there yet as a product.
| jjeaff wrote:
| MySQL 8 is not that far behind in feature parity. And is ahead
| when it comes to scalability. So I don't see postgres as
| necessarily standing alone.
| ksec wrote:
| Are there any Roadmap for MySQL 9 ?
| paozac wrote:
| MySQL's lack of DDL transactions is a serious shortcoming.
| ezekiel68 wrote:
| You claim that MySQL 8 is ahead when it comes to scalability.
| What are the bases of this claim? When I see comparisons or
| entire systems that rely on a database (that is, not micro-
| benchmarks) such as the TechEmpower web framework benchmarks
| [0] , I notice that the 'Pg' results cluster near the top,
| with the "My" results showing up further down the rankings. I
| understand this isn't version 14 of the former versus version
| 8 of the latter. But it makes me wonder what the basis of
| your claims is.
|
| [0] https://www.techempower.com/benchmarks/
| the_duke wrote:
| Techempower is not a database benchmark. The tests that
| involve a DB exist to include a DB client in the request
| flow, not to put any serious load on the database.
| isbvhodnvemrwvn wrote:
| Aren't those run on a single node DB server? And the
| queries don't really seem realistic at all, e.g. single
| query test fetches 1 out of 10 000 rows, with no joins at
| all. Fortunes fetches 1 out of 10 rows. This seems
| extremely trivial.
| merb wrote:
| well if you need more than one server, mysql has vitess,
| which is huge. postgres has citus, but that is way more
| complex to setup than vitess.
|
| I still would never use mysql, just because of vitess.
| [deleted]
| purerandomness wrote:
| No DDL transactions, no materialized views, the list is
| endless.
|
| There's almost no reason to pick MySQL for a new project.
| tfigment wrote:
| MySQL and mariadb have first class temporal tables. Pg has
| compile requirement and so cannot use in AWS RDS.
| lowercased wrote:
| I was aware maria had temporary tables, but not mysql
| proper. Any links you can point me to? Every search is
| coming up with 'temporary' table info, not temporal.
| lowercased wrote:
| mysql8 has gis/spatial stuff built in now. may not quite
| be on par with postgis, but... i also don't have to futz
| with "doesn't come baked in". Dealt with someone who
| wrote a whole bunch of lat/lon/spatial stuff in client
| code because we're on postgres but ... he couldn't get
| postgis installed (then even if he could, figuring out
| how to convince the ops people to add a new 'thing' in
| production would have been a delay).
|
| having stuff baked in is often a win.
| dragonwriter wrote:
| > MySQL and mariadb have first class temporal tables. Pg
| has compile requirement and so cannot use in AWS RDS.
|
| There's a pl/pgsql reimplementation of temporal tables
| specifically for that use case.
| phonon wrote:
| https://news.ycombinator.com/item?id=26768220
| jasonwatkinspdx wrote:
| There's a ton of room for improvement in the architecture of
| relational databases. This isn't a dig against Postgres, or
| ignoring how difficult it will be to get a new system to the
| same level of maturity. But databases designed natively for
| cloud/clustering, SSDs, (pmem soon perhaps), etc are quite a
| bit different. There's enormous simplifications and performance
| gains possible.
|
| There's been a lot of exciting work in this area over the last
| decade or so. Andy Pavlo's classes are great surveys of the
| latest work: https://15721.courses.cs.cmu.edu/spring2020/
|
| CosmosDB is an example of a relational (multi paradigm
| properly) database with a quite different architecture vs the
| classic design, that's moved into production status quite
| rapidly.
|
| FaunaDB and CockroachDB are moving with solid momentum too.
| oblio wrote:
| Yeah, to list a bit:
|
| - scaling is non-trivial (you can't just add a node and have
| PostgreSQL automagically Do The Right Thing(tm))
|
| - you can only have so many connections open to the database,
| causing issues with things such as AWS Lambda
|
| - I don't remember if this was changed, but I got the
| impression a while ago that having dynamic DB users was a bit
| cumbersome to set up (plug PostgreSQL to AD/LDAP)
| threeseed wrote:
| There are also technologies like NVMe over Fabric/RDMA, eBPF,
| XDP, io_uring etc which are just starting to get traction and
| are game changers for performance. None of which are being
| used.
|
| All of these require a different architecture so expect to
| see newer databases push things even further.
| exceptione wrote:
| You must be kidding me with the CosmosDB mention. It doesn't
| even have foreign key constraints. I have to work with it and
| I have never seen such a feature-poor dbms before.
| gogopuppygogo wrote:
| Cockroach is the worst brand for a database ever.
|
| Even Croach would be a massive branding improvement.
|
| This is similar to how gimp is a terrible brand.
| meesterdude wrote:
| I mean... the WORST? For me Mongo takes the cake, but
| oracle is up there too.
| zdragnar wrote:
| Really? Oracle actually makes a lot of sense to me for a
| database name (in the 'source of truth' sense, not in the
| prophet sense).
|
| Mongo, on the other hand, has definitely always had the
| racist/ablist slur as the first connotation for me.
| whatshisface wrote:
| I've learned almost all the slurs I know from comments or
| media sources complaining about them. It's the only place
| they're used in polite society.
| efxhoy wrote:
| > Automatic cancellation of long-running queries if the client
| disconnects
|
| Sweet! I often screw up a query and need to cancel it with
| pg_cancel_backend(pid)
|
| because Ctrl-C rarely works. With this I can just ragequit and
| reconnect. Sweet!
| znep wrote:
| I agree this is a great addition, but FWIW it isn't normal for
| ^C to not work in psql. Perhaps you are using some other client
| that doesn't support aborting queries properly, or have
| something on the network between you and the server behaving
| poorly and dropping connections?
| e1g wrote:
| Another exciting feature in PG14 is the new JSONB syntax[0],
| which makes it easy to update deep JSON values -
| UPDATE table SET some_jsonb_column['person']['bio']['age'] =
| '99';
|
| [0] https://erthalion.info/2021/03/03/subscripting/
| xfalcox wrote:
| Wow is this for real? That is such a big quality of life
| change! Happy to see it!
| megous wrote:
| Not much different from some_jsonb#>>'{some,path}' and once
| you add the need to convert out of jsonb to text, you'll not
| be saving any characters either. At least for queries.
|
| For updates, it looks nice I guess.
| megous wrote:
| Downvoters could instead provide a way to get a decoded
| text of a property with this new syntax, like it's possible
| with #>>. That would be more useful.
| zdragnar wrote:
| I think the difference is _familiarity_.
|
| It shouldn't matter so much, but when you don't use one
| language as much as you do other languages, it becomes that
| much harder to remember unfamiliar syntaxes and grammars,
| and easier to confuse similar looking operations with each
| other.
| megous wrote:
| In that case this does not help. SELECT json['a']; will
| not return the value of the string in {"a":"ble"} (like
| it does in Javascript), but a JSON encoding of that
| string, so '"ble"'. You'll still not be able to do simple
| comparisons like `SELECT json_col['a'] = some_text_col;`
| Superficial familiarity, but it still behaves differently
| than you expect.
|
| Is there even a function that would convert JSON encoded
| "string" to text it represents in postgresql? I didn't
| find it.
|
| So all you can do is `SELECT json_col['a'] =
| some_text_col::jsonb;` and hope for the best (that string
| encodings will match) or use the old syntax with ->> or
| #>>.
| GordonS wrote:
| Gods, but this is fabulous!
|
| JSONB capabilities in Postgres are amazing, but the syntax is
| really annoying - for example, I'm forever mixing up `->` and
| `->>`. This new syntax feels _far_ more intuitive.
| topicseed wrote:
| Constantly have to google up the JSONB wuery syntax, it's
| just too confusing to me, although it is indeed powerful.
|
| Glad to this this super intuitive and familiar syntax added.
| Will make writing these updates a lot lot lot easier. Not
| even close!
___________________________________________________________________
(page generated 2021-05-22 23:00 UTC)