[HN Gopher] Speeding up sort performance in Postgres 15
___________________________________________________________________
Speeding up sort performance in Postgres 15
Author : todsacerdoti
Score : 292 points
Date : 2022-05-20 07:58 UTC (15 hours ago)
(HTM) web link (www.citusdata.com)
(TXT) w3m dump (www.citusdata.com)
| MR4D wrote:
| It is this type of analytics and discipline that I love about the
| PG team.
|
| It takes a lot of dedication and patience to do, but I wish more
| software teams did this. Perhaps schools should put more emphasis
| on this type of analysis.
|
| Maybe then Call of Duty would load in a reasonable time. (sorry,
| pet peeve of mine)
| maccard wrote:
| I work in video games and have done my fair share of perf work.
| Pretty much every game does this kind of work, and all the live
| service games have had a full team dedicated to ongoing
| performance work.
|
| I could easily sit and throw shade that Twitter takes longer to
| show my timeline than most ps5 games do to boot up and load
| into game, but that doesn't help anyone.
| MR4D wrote:
| You make a good point, and I am aware of that. Frankly, the
| engineering that goes into any FPS (even the bad ones) is
| beyond amazing, especially when you add networking latencies
| to the complexity.
|
| I should have chosen websites for my rant, but CoD has been
| on my mind this week due to the weird way you have to keep
| getting into a game every time and go through what seems like
| multiple home screens in a way that is either a bug, or just
| poorly thought out. Either way, it is a weird waste of time,
| and annoying when you're trying to play two back-to-back
| games.
| hinkley wrote:
| It's typical in non-gaming teams to have any instinct to
| optimize beaten out of you instead of guided.
|
| Knuth's comments about premature optimization have been so
| misquoted at this point that we've gone way beyond people
| using it as permission not to think, they also use it as a
| mandate to stop other people from thinking as well.
|
| Academics are particularly bad at falling back on aphorisms,
| so how one would change the curriculum to fix this would also
| involve changing the teachers, which as an automatic
| conversation killer.
|
| In my case I was keeping my own council before anyone tried
| to beat it out of me, and I learned to just keep things to
| myself. On the plus side I know dozens of ways to refactor
| for legibility and 'accidentally' improve performance at the
| same time.
| HideousKojima wrote:
| >On the plus side I know dozens of ways to refactor for
| legibility and 'accidentally' improve performance at the
| same time.
|
| My favorite is "That nested loop is kind of hard to read,
| how about making a dictionary/hashmap instead?"
| zerocrates wrote:
| There's also the factor of what kinds of performance you care
| about and are thinking about, and what you're not.
|
| There was the memorable story shared around here a while ago
| of a player patching GTA V's live service online to eliminate
| multiple-minute load times [1]. I'm sure Rockstar had/has
| many people working on performance (the core game is a well-
| known good performer and now has been on three different
| console generations since launch) but just, not this.
|
| (Though to totally miss such truly extensive load times on
| your main revenue-generating component, that one's a
| headscratcher. Feels like something you'd end up addressing
| just for your own internal sanity. Maybe everyone just had
| monster workstations and didn't notice.)
|
| [1] https://news.ycombinator.com/item?id=26296339
| darksaints wrote:
| One of the things I have been consistently excited about was the
| new pluggable storage API, and I had followed a handful of
| projects that were working on plugins, like zedstore, zheap, and
| cstore. But now it seems like all of those efforts have
| stagnated. Any idea what happened?
| ccleve wrote:
| The difficulty is that the new table access method is very
| complex and hard to get right. I'm building something using a
| regular index access method now and it's hard enough. Once I
| get all that infrastructure working properly I may take another
| crack at a table access method.
|
| For some purposes you can get away with a foreign data wrapper.
| FDWs are a lot simpler.
| jfbaro wrote:
| orioleDB is another one
| kabes wrote:
| A couple of months back there was a post from someone who made a
| presentation about a lot of things that are 'bad' in postgres
| (mostly performance-wise) and a github repo that tried to fix
| that (in first instance with patches and the idea was to later
| have it either upstreamed or as an addon).
|
| I can't find anything of this effort anymore. Does someone know
| what I'm talking about and what the current status of that is?
|
| EDIT: found it: https://github.com/orioledb/orioledb
|
| I'd love to hear what PG devs here think of his criticism and
| proposed solutions.
| fuy wrote:
| Agree with you - would love to hear some of the core PG devs to
| pay attention and discuss this work.
| ConnorLeet wrote:
| I think that this repo was created by a core PG dev.
|
| https://github.com/orioledb/orioledb/commits?author=akorotko.
| ..
|
| https://github.com/akorotkov
| canadiantim wrote:
| gampleman wrote:
| Could the specialisation of sort functions be further improved by
| using e.g. something like Radix sort for integers?
| JacKTrocinskI wrote:
| They added the MERGE statement!
|
| ...and better defaults for the public schema.
|
| Really looking forward to Postgres 15.
| avar wrote:
| A related discussion about inlining these sort functions,
| templating etc.: https://www.postgresql.org/message-
| id/flat/CA%2BhUKGJ2-eaDqA...
| hinkley wrote:
| I'm surprised they use quicksort. I would think in a database
| you'd value predictability over best case behavior.
|
| Also is it a stable quicksort?
| Jamie9912 wrote:
| Great stuff. What's the best way to update a major version of
| Postgres when it comes out?
| mike-cardwell wrote:
| https://www.postgresql.org/docs/current/upgrading.html
| dewey wrote:
| This is just a data point but in my experience over the past
| few years Postgres updates are usually nothing to worry about.
| There's other systems where updates come with a lot more
| baggage attached (Like ElasticSearch) where you have to
| refactor schemas and stay up to date with the documentation way
| more than with PG.
| jon_richards wrote:
| Homebrew on mac has a long-standing upgrade bug with Postgres
| extensions like postgis. It uninstalls the old postgis before
| copying the data over and then gets stuck in a hard to
| recover state. I always assume I'm going to lose all my data
| whenever I update major versions.
| speedgoose wrote:
| It's difficult to say without knowing your situation.
| Personally I would wait a few months at least to be sure that
| the biggest bugs are killed. Then if it's a new project or
| something not critical, I may upgrade without any precaution
| and hope for the best. For something more critical I would rely
| on the integration tests and manual testing in a pro-production
| environment for a few weeks. Before the final upgrade, I would
| finally do a few practice upgrade runs and be ready to rollback
| if needed.
| altendo wrote:
| > Personally I would wait a few months at least to be sure
| that the biggest bugs are killed.
|
| This is my default behavior with any database product. Solid
| advice.
| zeckalpha wrote:
| One graph has sorting a 10GB table and refers to it as a "Large
| sort"... is 10GB large any more?
| dagw wrote:
| "Large" in this context is stated to mean significantly larger
| than work_mem, which defaults to 4MB out of the box.
| Interestingly performance (relative to PG14) seems to get worse
| as work_mem increases and the ratio of table size to work_mem
| decreases. If this holds it should mean that for a fixed
| work_mem PG15 performance (again relative to PG14) should
| improve as table size increases.
| pigcat wrote:
| > performance (relative to PG14) seems to get worse as
| work_mem increases and the ratio of table size to work_mem
| decreases
|
| Do you have any source for this? This is the first time I've
| ever heard this and it seems contrary to everything I've
| read.
| d_watt wrote:
| Not for OLAP, but this seems to be about OLTP. If you're trying
| to sort 10GB of data for a traditional OLTP query, I would
| consider that "too large" and need of some data structure
| optimization.
|
| Also it's specifically about queries where the sorted data is
| too big for memory:
|
| > The final change is aimed at larger scale sorts that exceed
| the size of work_mem significantly. The algorithm which merges
| individual tapes was changed to use a k-way merge. When the
| number of tapes is large, less I/O is required than the
| original polyphase merge algorithm.
| hinkley wrote:
| It is a common complaint in discussions about sort algorithms
| that the case for sorting numbers is a degenerate case and
| 'real' cases involve at least pointer chasing, which can make
| the cost of compares to stop being constant time, or have a
| constant time that is greater than log(n) (or worse, n).
|
| Sometimes locality of reference may increase the total number
| of comparisons but decrease the average cost considerably.
| For practical applications like databases, it's the runtime
| that matters, not the theory.
| nhoughto wrote:
| Interesting to see there is still some more to squeeze out of the
| old girl yet. Most seem like things that could have attempted
| before now but I guess weren't the top pain points to fix, I
| wonder how many more 4-6% savings there are in there, good to see
| people going deep and keeping the improvements coming.
| mattashii wrote:
| I'm currently working (on and off) on improving the performance
| of specific "kinds"/"shapes" of btree indexes by several %s
| (depending on shape it can be tens of percents). I hope and
| expect that it'll be ready for Pg16, but that of course assumes
| that the concept behind the implementation is acceptable.
|
| See https://www.postgresql.org/message-
| id/flat/CAEze2Wg52tsSWA9F... for specifics
| mchusma wrote:
| Thanks so much for your contributions! The community for PG
| is so great.
| jeffbee wrote:
| Anyone who is using a distro package of pg has the largest
| opportunity because they are all built wrong. A peak-optimized
| binary will get you much more than 4% speed up over generic
| packages.
| brightball wrote:
| Using PG for years and this is the first time somebody made
| me aware of this, so thank you.
| Bootvis wrote:
| Can you please elaborate, what's wrong and what do I need to
| do when building from source?
| adgjlsfhk1 wrote:
| compile with --march=native can often be significant.
| mayama wrote:
| Will native march option work in case of cloud where
| you're not in control of cpu?
| jeffbee wrote:
| I haven't used a cloud where you don't dictate the exact
| CPU you want. EC2, GCE, and Azure all provide specific
| guarantees. Even if you wanted a generic any-cloud binary
| you would probably compile with -march=skylake or, if
| truly paranoid, sandybridge.
|
| Many distro packages are, by contrast, compiled for
| k8-generic. Freaking Opteron!
|
| BTW "native" is not really recommendable. One really
| ought to target a particular platform.
| brianwawok wrote:
| Do google and AWS hosted postgres do this? I assume, but
| I am not sure how to check.
| jeffbee wrote:
| I don't know but it would be out-of-character for Google
| to have a C program that wasn't optimized to the hilt.
| jeffbee wrote:
| Compiling for your target is good, not using the PLT for
| mem* functions helps a ton, feedback-directed
| optimization and LTO and/or post-link optimization helps
| the most. For CPU-bound workloads of course.
| hinkley wrote:
| In the long tail it's more a matter of choosing a module or
| concern and squeezing a percent here and a percent there until
| it adds up to 5%, rather than looking for tall tent poles. Also
| counting calls for a fixed workload, and making sure this call
| stack really should call functionB nine times for three results
| or if that should be six or three. Function is still exactly
| the same cost but is now 1% of the overall cost instead of 3%.
| mkurz wrote:
| Great work!
| nitinreddy88 wrote:
| "The first test I ran while working on reducing the memory
| consumption of sort improved performance by 371%"
|
| This is impressive in terms of performance for SQL Server
| operations. Imagine the end performance benefits for
| analytical/warehouse model of queries.
| mattashii wrote:
| > SQL Server operations
|
| I might me misunderstanding you, but this is a post about
| PostgreSQL, not Microsoft's SQL Server.
|
| > performance benefits for analytical/warehouse model
|
| That performance result was for a fully in-memory sort
| operation, which you are are less likely to see in data
| warehouses.
| dspillett wrote:
| _> but this is a post about PostgreSQL, not Microsoft 's SQL
| Server._
|
| I assume that what was meant is what I'd say as "SQL engine"
| or "SQL query runner". It is unfortunate that MS has a habit
| of using common words in its product names. PG _is_ a SQL
| server (no capital) in the sense that it is a server that
| processes SQL statements for retrieving and manipulating
| data.
| dewey wrote:
| There's a lot of interesting improvements in PG15, some of my
| favorites from
| https://www.postgresql.org/about/news/postgresql-15-beta-1-r...
| are:
|
| >PostgreSQL 15 introduces the jsonlog format for logging. This
| allows PostgreSQL logs to be consumed by many programs that
| perform structured log aggregation and analysis.
|
| >pg_basebackup, a utility used to take full backups of a
| PostgreSQL cluster, now supports server-side compression using
| Gzip, LZ4, or Zstandard compression.
|
| >This includes the introduction of parallelization for SELECT
| DISTINCT statements and improvements in performance to window
| functions that use row_number(), rank(), and count().
|
| >PostgreSQL 15 builds on its existing support for the SQL/JSON
| path language by including more standard SQL/JSON functions.
| These include SQL/JSON constructors, query / introspection
| functions, and the ability to convert JSON data into a table.
| baggiponte wrote:
| That's fascinating! Totally orthogonal: how does PG store data?
| Do they use columnar specifications such as Arrow & Parquet?
| chrsig wrote:
| The main table is stored in a 'heap', which is more similar
| to the dynamic allocation region of a program rather than the
| data structure.
|
| If it needs to write to the table, it has a free space map
| listing all of the available fixed size pages. Each page can
| hold some number of fixed size tuples.
|
| I'm surprised they've gotten so far with this. It seems ripe
| for fragmentation issues. I'm surprised they don't at least
| have a b+tree implementation to provide clustering based on
| the primary key. They have a b+tree implementation for
| indices, but not for tuple storage. IIRC, there's a feature
| to allow fixed size data row to be stored directly in the
| index, to help alleviate the need to look in the heap table.
| Perhaps that's what's kept the heap as viable?
|
| Variable length strings are kept in a separate file (TOAST
| tables), referenced by the tuple. This keeps records fixed
| size, and allows for circumventing disk reads for string data
| for queries that don't require them. I'm not quite sure how
| the allocation scheme or data structure works there. My (very
| speculative) guess is that they're stored as a linked list,
| and on write, the allocator tries to give contiguous pages.
|
| fwiw, I don't use postgresql on a regular basis, I've just
| spent some good amount of time squinting at the file format
| and related code, soley due to my own curiosity. It's been a
| while, so it's very possible that I'm misrepresenting or
| misremembering something. Someone has already posted their
| documentation on the file format -- it's a good read if
| you're interested.
| tomnipotent wrote:
| > but not for tuple storage
|
| Both tables and indexes use the same b+-tree implementation
| based on Lehman and Yao (prev/next links on internal
| nodes).
|
| > Each page can hold some number of fixed size tuples
|
| Variable size.
|
| > Variable length strings are kept in a separate file
|
| Only if the row exceeds the page size, otherwise everything
| is stuffed into the data leaf node.
|
| > I'm not quite sure how the allocation scheme or data
| structure works there
|
| Values are broken up into DataSize/PageSize "virtual rows"
| and stitched back together, but everything is still in the
| same slotted page structure.
| SigmundA wrote:
| >Both tables and indexes use the same b+-tree
| implementation based on Lehman and Yao (prev/next links
| on internal nodes).
|
| Tables in Postgres are not btree's they are unordered
| heaps. The row id in the b-tree index points to a page
| number + item index in the heap.
|
| A true clustered index would simply expose the index
| itself as a table without secondary heap storage,
| reducing I/O and storage space. Row visibility seems to
| complicate this in PG as index only scans may require
| checking the heap for visibility.
| uhoh-itsmaciek wrote:
| >Variable length strings are kept in a separate file (TOAST
| tables), referenced by the tuple. This keeps records fixed
| size...
|
| That's not entirely true: variable-width row values under a
| size threshold are stored inline (possibly compressed).
|
| The beauty part about TOAST is that they're mostly just
| regular tables (any table that needs its data TOASTed gets
| its own TOAST table). You can look them up in the system
| catalogs and query them directly.
|
| The schema for these is (chunk_id, chunk_seq, chunk_data).
| Chunk "pieces" are limited in size so they can be stored
| inline (obviously there's not much benefit to a TOAST table
| itself having a TOAST table), so any row values that need
| to be TOASTed may also be broken up into multiple pieces.
| Row values stored inline in the "owning" table reference a
| chunk_id where the TOASTed data is stored, and queries that
| need to return that data look up all the chunks for a chunk
| id, ordered by chunk_seq.
|
| You're right that this scheme avoids reading TOASTed data
| when you're not actually using it (a good reason to avoid
| "SELECT *" if you've got big TOASTed data).
| uhoh-itsmaciek wrote:
| There's a pretty good high-level overview of the
| implementation details in the official docs:
| https://www.postgresql.org/docs/current/storage.html
| paulryanrogers wrote:
| It's row based with versions kept along side current data.
| Later to be vacuumed away. Unlike a redo log based approach.
| thejosh wrote:
| I'm also super excited for the points you raised, looks like
| some great features being released, I really like that JSON is
| becoming a first class citizen.
|
| I'm also looking forward to the \copy improvements, especially
| around loading data into PG.
| mchusma wrote:
| I use both SELECT DISTINCT and row_number a lot, so excited for
| that as well.
| 1500100900 wrote:
| SELECT DISTINCT is rarely the best idea. In most cases it
| means someone is using joins to filter out rows that should
| have been removed with EXISTS (a semi-join).
| sa46 wrote:
| Interesting, I hadn't thought of it that way. Looking
| through our codebase:
|
| - We get a lot of mileage out of DISTINCT ON to get the
| most recent version of a row for some subset of columns. I
| think this a different use case than what you refer to.
|
| - We typically use DISTINCT to get PKs for a table when
| starting from another table. Like, find all user IDs that
| commented on a given story by looking at the join table.
| SELECT DISTINCT user_id FROM story_comment WHERE story_id =
| 4
| BitPirate wrote:
| I think they missed one of the biggest changes: ICU collations
| can finally be set as default for clusters and databases.
| aftbit wrote:
| Ah yes, good old: # set client\_min\_messages
| TO ’debug1’;
| maccard wrote:
| > The change made here adds a set of new quicksort functions
| which suit some common datatypes. These quicksort functions have
| the comparison function compiled inline to remove the function
| call overhead.
|
| Stuff like this frustrates me. This is what generics are for in
| most languages. Looking at the commit[0], this is a _perfect_
| example of where a sprinkling of C++ features (a very basic
| template) would outperform the original C version. It's also
| super disappointing that in 2022 we're still manually marking 10-
| line functions used in one place and called from one site as
| inline.
|
| [0]
| https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...
| [deleted]
| bfgoodrich wrote:
| > This is what generics are for in most languages.
|
| That goes exactly _opposite_ of this optimization, and is just
| entirely orthogonal.
|
| > It's also super disappointing that in 2022 we're still
| manually marking 10- line functions used in one place and
| called from one site as inline.
|
| You misunderstand the change.
|
| In a nutshell the prior version was a polymorphic function
| pointer that varied by the sort type.
| (*sort_ptr)(data);
|
| No compiler can inline this, no matter how aggressively you
| turn up the optimizations.
|
| Their change adds specialized cases. if
| (dataType == PG_INT32) { sort_int32(data); }
| else if (dataType == PG_INT64) { sort_int64(data);
| } ... { } else { (*sort_ptr)(data); }
|
| This enabled the inlining, and while it's no harm to add the
| inline specifier if you want to be very overt in your
| intentions, most compilers would inline it regardless in most
| scenarios.
| samhw wrote:
| >> This is what generics are for in most languages. > That
| goes exactly opposite of this optimization, and is just
| entirely orthogonal.
|
| What does that mean? Also, how is it "exactly opposite" and
| "orthogonal"? Maybe let's get rid of the mixed metaphors and
| speak in concrete terms.
|
| In your comment below ("why would you want the compiler to do
| that? you can write out all the monomorphizations yourself!")
| I'm not totally certain that you understand the point of
| generics. _You can always write out each case yourself._
| Incidentally that also applies to your precious inlining. But
| it 's a tool to make programmers more productive.
| [deleted]
| vlovich123 wrote:
| In c++, the sort algorithm is polymorphic and inlined because
| it automatically generates those sort variants for you.
|
| Op is not trying to say "the code as is" but "written in c++
| style".
|
| I don't think they misunderstood anything (and their beef
| with seemingly unprofiled use of always inline attribute is
| probably correct too)
| bfgoodrich wrote:
| > I don't think they misunderstood anything
|
| The code could have made the type-specific sort macros.
| They could have literally written the code blocks in the
| actual sort function.
|
| Instead they decided to make it separate functions and mark
| it inline because _that is their overt intention_ and they
| thought it was cleaner.
|
| What is the problem? Honestly, aside from noisy, useless
| griping, what is possibly the problem with that?
|
| Their implication seems to be that inline shouldn't be
| necessary. And the truth is that it almost certainly _isn
| 't_ necessary, and any optimizing compiler would inline
| regardless. But clarifying their intentions is precisely
| what developers should do: "We're moving it here for
| organizational/cleanliness purpose, but it is intentionally
| written this way to be inlined because that is the entire
| point of this optimization".
|
| As to the "unprofiled" use, their specific optimization was
| because they know the time being wasted in function calls,
| and the optimization was to avoid that.
| macdice wrote:
| (Author of the code here) Yeah :-)
| JacobiX wrote:
| Just to clarify, this is what C++ templates do, they can create
| specializations for you, because templates are processed at
| compile-time. But its not how generics works for most languages
| (runtime, type erasure, etc).
| macdice wrote:
| Speaking as the author of that change, and also as a long time
| user of C++ (in other projects), I hear ya! But I don't have
| the power to make PostgreSQL switch to C++. Years ago,
| std::sort() vs qsort() was almost a standard example of how C++
| can be faster than C.
|
| src/lib/sort_template.h is an example of a pseudo template
| function, but we also have at least one pseudo template data
| structure in src/lib/simplehash.h.
|
| And if you want to see some "policy based design" (C++
| community term) in PostgreSQL C code, check out
| src/backend/exec/nodeHashjoin.c. It uses forced inlining to
| "instantiate" two different versions of the hash join executor
| code, one with parallelism and one without, so that we don't
| have all the "am I in parallel mode?" branching/code size in
| the generated code.
| mort96 wrote:
| > It's also super disappointing that in 2022 we're still
| manually marking 10- line functions used in one place and
| called from one site as inline.
|
| The `inline` keyword usually isn't necessary anymore to get the
| compiler to inline stuff, the compiler has some pretty good
| heuristics for determining whether to inline a function call or
| not. The `inline` keyword is an extra hint to the compiler, and
| affects linkage (i.e multiple object files defining the same
| `inline` function does not result in a linker error).
|
| I don't understand what you're disappointed at, this all seems
| very reasonable.
| maccard wrote:
| I never said they used the inline keyword, you inferred that.
| If you look at the diff, they mark the fuctions as
| `pg_attribute_always_inline` which expands to __forceinline
| or __attribute__((always_inline)) inline depending on your
| compiler. These specifiers will remove any of the compilers
| heurestics [0].
|
| > I don't understand what you're disappointed at, this all
| seems very reasonable.
|
| I'm disappointed by the fact that we're still manually
| overriding compiler heuristics and manually writing faux
| copy-pasted generics for performance, when this is a solved
| problem in so many other languages.
|
| EDIT: I would _love_ to see some benchmarks without the
| pg_attribute_always_inline to see whether forcibly inlining
| the code is even necessary. In my (extended) experience with
| optimising C++ code, leaning on __forceinline is unnecessary
| in the vast vast majority of cases and should be used very
| sparingly.
|
| [0] https://docs.microsoft.com/en-us/cpp/cpp/inline-
| functions-cp...
| gpderetta wrote:
| If the profiler tells you to use force_inline, you use
| force_inline.
| maccard wrote:
| Given this is a blog post explaining their optimisations,
| you'd expect them to mention profiling with/without this
| attribute if they did it. Looking at the email thread,
| they share profiling results of before/after but there's
| nothing justifying the use of __forceinline in the thread
| or the blog post as far as I can see.
| macdice wrote:
| FWIW I just removed pg_attribute_always_inline from those
| three comparator functions in tuplesort.c and it made no
| difference to the generated code under GCC 10 at -O2. If
| we can show that's true for Clang and MSVC too, maybe we
| should just take them out. Edit: But I don't really mind
| them myself, they do document the intention of the
| programmer in instantiating all these specialisations...
| tialaramex wrote:
| At least they _measured_ , so that's immediately at least
| a B+. It is of course often hard to know whether anybody
| actually measured before doing their work or only
| afterwards to justify it but we should give benefit of
| the doubt.
|
| Not everything you tried is going to be worth explaining,
| and I get that their whole plan here is "forcibly inline
| this" so in C, where the compiler won't necessarily get
| your memo, it seems reasonable to explicitly say so.
| Sesse__ wrote:
| > The `inline` keyword usually isn't necessary anymore to get
| the compiler to inline stuff, the compiler has some pretty
| good heuristics for determining whether to inline a function
| call or not.
|
| Not really. The heuristics are pretty much based on a rough
| idea of function size (e.g. if it believes the call overhead
| is larger than just inlining the function, it will generally
| always be done if optimization is on), and/or functions it
| can prove is used only once (e.g. because they are static).
| That's it. There's no good idea of e.g. "this looks like a
| tight loop, and if I inline, I can maybe special-case away
| this cruft here". The programmer's hints are very much
| relevant even in 2022.
| jeffbee wrote:
| They're not. You are either using PGO and getting effective
| inline or you are just guessing.
| ollysb wrote:
| It seems a little surprising that GIN indexes have never been
| improved to allow for efficient sorting. Is this just not in
| enough demand to receive any attention or is there a technical
| aspect that makes it particularly difficult?
| mattashii wrote:
| I'm not familiar with the full details, but GIN indeed has some
| issues that make it difficult to make presorting expensive:
|
| It has a guarantee of only one item in the tree for each key.
| All table rows that share that key are stored under that one
| item, as one large value. This value can thus become really
| large (and is stored as a subtree in such cases).
|
| To sort these tuples, you can either implement that as 'merge
| key/value pairs when keys are equal during the sort phase' or
| 'merge key/value pairs when keys are equal during the index
| load phase'.
|
| The first requires handling of arbitrarily large KV pairs
| (potentially larger than the normal page size, which indexes
| generally don't handle well) and losing sort elements during
| the sort (due to merges of values, I'm not sure sort algorithms
| can work with that).
|
| The second requires a significantly larger temporary space to
| store the temporary data (potentially several times the size of
| the original data).
|
| Additionally, someone must first find this enough of an issue
| to try to fix this, and apparently nobody has yet thought it to
| be enough of an issue to take a stab at it.
___________________________________________________________________
(page generated 2022-05-20 23:01 UTC)