[HN Gopher] Playing with BOLT and Postgres
___________________________________________________________________
Playing with BOLT and Postgres
Author : aquastorm
Score : 56 points
Date : 2024-10-04 17:17 UTC (1 days ago)
(HTM) web link (vondra.me)
(TXT) w3m dump (vondra.me)
| miohtama wrote:
| 10% - 20% performance improvement for PostgreSQL "for free" is
| amazing. It almost sounds too good to be true.
| albntomat0 wrote:
| There's a section of the article at the end about how Postgres
| doesn't have LTO enabled by default. I'm assuming they're not
| doing PGO/FDO either?
|
| From the Bolt paper: "For the GCC and Clang compilers, our
| evaluation shows that BOLT speeds up their binaries by up to
| 20.4% on top of FDO and LTO, and up to 52.1% if the binaries
| are built without FDO and LTO."
| touisteur wrote:
| I've always wondered how people actually get the profiles for
| Profile-Guided-Optimization. Unit tests probably won't
| actuate high-performance paths. You'd need a set of
| performance-stress tests. Is there a write-up on how everyone
| does it in the wild ?
| mhh__ wrote:
| You might be surprised how much speedup you can get from
| (say) just running a test suite as PGO samples. If I had to
| guess this is probably because compilers spend a lot of
| time optimising cold paths which they otherwise would have
| no information about
| pgaddict wrote:
| Yeah, getting the profile is obviously a very important
| step. Because if it wasn't, why collect the profile at
| all? We could just do "regular" LTO.
|
| I'm not sure there's one correct way to collect the
| profile, though. ISTM we could either (a) collect one
| very "general" profile, to optimize for arbitrary
| workload, or (b) profile a single isolated workload, and
| optimize for it. In the blog I tried to do (b) first, and
| then merged the various profiles to do (a). But it's far
| from perfect, I think.
|
| But even with the very "rough" profile from "make
| installcheck" (which is the basic set of regression
| tests), is still helps a lot. Which is nice. I agree it's
| probably because even that basic profile is sufficient
| for identifying the hot/cold paths.
| pgaddict wrote:
| With the LTO, I think it's more complicated - it depends on
| the packagers / distributions, and e.g. on Ubuntu we
| apparently get -flto for years.
| fabian2k wrote:
| My first instinct is that the effect is too large to be real. But
| that should be something other people could reproduce and verify.
| The second thought is that it might overfit the benchmark code
| here, but they address it in the post. But any kind of double-
| digit improvement to Postgres performance would be very
| interesting.
| pgaddict wrote:
| (author here)
|
| I agree the +40% effect feels a bit too good, but it only
| applies to the simple OLTP queries on in-memory data, so the
| inefficiencies may have unexpectedly large impact. I agree
| 30-40% would be a massive speedup, and I expected it to
| disappear with a more diverse profile, but it did not ...
|
| The TPC-H speedups (~5-10%) seem much more plausible,
| considering the binary layout effects we sometimes observe
| during benchmarking.
|
| Anyway, I'd welcome other people trying to reproduce these
| tests.
| fabian2k wrote:
| I looked and there is no mention of BOLT yet in the pgsql-
| hackers mailing list, that might be the more appropriate
| place to get more attention on this. Though there are
| certainly a few PostgreSQL developers reading here as well.
| pgaddict wrote:
| True. At the moment I don't have anything very "actionable"
| beyond "it's magically faster", so I wanted to investigate
| this a bit more before posting to -hackers. For example,
| after reading the paper I realized BOLT has "-report-bad-
| layout" option to report cases of bad layout, so I wonder
| if we could identify places where to reorganize the code.
|
| OTOH my blog is syndicated to
| https://planet.postgresql.org, so it's not particularly
| hidden from the other devs.
| Avamander wrote:
| How easy would it be to have an entire distro (re)built with
| BOLT? Say for example Gentoo?
| fishgoesblub wrote:
| It would be difficult as every package/program would need a
| step to generate the profile data by executing and running the
| program like the user would.
| metadat wrote:
| Is it theoretically possible to perform the profile
| generation+apply steps dynamically at runtime?
| albntomat0 wrote:
| I posted this in a comment already, but the results here line up
| with the original BOLT paper.
|
| "For the GCC and Clang compilers, our evaluation shows that BOLT
| speeds up their binaries by up to 20.4% on top of FDO and LTO,
| and up to 52.1% if the binaries are built without FDO and LTO."
|
| "Up to" though is always hard to evaluate.
| krick wrote:
| Does it work with rustc binaries?
___________________________________________________________________
(page generated 2024-10-05 23:00 UTC)