[HN Gopher] I wrote a new JIT compiler for PostgreSQL
___________________________________________________________________
I wrote a new JIT compiler for PostgreSQL
Author : mattashii
Score : 200 points
Date : 2024-03-18 11:57 UTC (11 hours ago)
(HTM) web link (www.pinaraf.info)
(TXT) w3m dump (www.pinaraf.info)
| pinaraf wrote:
| Author here. Thanks for submitting my article on hackernews. I'll
| do my best to answer any question.
| frizlab wrote:
| Not a question, but I love this. I'm eager to see its
| evolution.
| pgaddict wrote:
| Would be a great topic for pgconf.eu in June (pgcon moved to
| Vancouver). Too bad the CfP is over, but there's the
| "unconference" part (but the topics are decided at the event,
| no guarantees).
| mattashii wrote:
| Did you mean pgconf.dev in May (which has the unconference),
| or pgconf.eu in October (which doesn't have an unconference,
| but the CfP will open sometime in the - hopefully near -
| future)?
| pgaddict wrote:
| Yeah, I meant May. Sorry :-( Too many conferences around
| that time, I got confused.
|
| That being said, submitting this into the pgconf.eu CfP is
| a good idea too. It's just that it seems like a nice
| development topic, and the pgcon unconference was always a
| great place to discuss this sort of stuff. There are a
| couple more topics in the JIT area, so having a session or
| two to talk about those and how to move that forward would
| be beneficial.
| can3p wrote:
| Nice post, thanks! Do I read it right that using jit results in
| the worst max times? What could be a reason in your opinion?
| pinaraf wrote:
| Two parts: I did the benchmark on a laptop and didn't spend
| enough time forcing its runtime PM in a fixed state, I'll run
| a real pgbench on my desktop once I implement all required
| opcodes for it. And since JIT requires a minimum amount of
| time (about 300us on my tests), on such small runtimes this
| can quickly overcome the benefits.
| winternewt wrote:
| Is there a fundamental difference between copy and patch with C
| and what compilers do when they target intermediate
| representations? It seems to me that traditional compilation
| methods are also "copy and patch" but with another intermediate
| language than C.
| tetha wrote:
| I think conceptually, there is no real difference. In the
| end, a compiler outputting machine code uses very small
| stencils, like "mov _ _", which are rather simple to patch.
|
| Practically though, it's an enormous difference, as the copy
| and patch approach re-uses the years of work going into clang
| / gcc supporting platforms, optimizations for different
| platforms and so on. The approach enables a much larger pool
| of people ("People capable of writing C" vs "People capable
| of writing assembly / machine code") to implement very decent
| JIT compilers.
| pinaraf wrote:
| The real difference is in the possible optimizations. If
| you consider the full scope of JIT compilation in for
| instance a web browser or the JVM, you could use copy and
| patch as a tier 0 compiler, and once really hot paths are
| identified, trigger a complete compiler with all the
| optimizer steps. Some optimizations are more complicated to
| implement with copy-patch, esp. if you can't use all the
| tricks described in the paper (for instance they use the
| ghccc calling convention to get a much finer register
| allocation, but from the documentation I don't think it's
| going to make it for PostgreSQL).
|
| But as you say, yes, this enables people capable of writing
| C and reading assembly (or you have to be perfect and never
| have to go into gdb on your compiled code), and it makes
| the job so much faster and easier... Writing several
| machine code emitters is painful, and having the required
| optimization strategies for each ISA is quickly out of
| reach.
| adzm wrote:
| I'm still surprised there isn't a query/plan cache for
| PostgreSQL. I could easily see these two approaches working in
| harmony once it does, as frequent queries could end up being
| cached and more aggressively optimized with a cache to offset the
| compilation cost. Of course that adds a whole new layer of
| complexity and trouble.
| pinaraf wrote:
| Honestly I thought the same as you, then I wrote this, and I
| now understand it's going to be really hard to do. To make it
| very simple: there are pointers to query parts "leaking"
| everywhere across the execution engine. Removing them will
| require a significant overall of the execution engine, the
| planner and who knows what else. Even in a single session, two
| compiled queries will have different compiled code because of
| that (both llvm and my copyjit have to inject the adresses of
| various structs in asm code)
| adzm wrote:
| Just going to say, I'm blown away by how simple this JIT is
| though. Really quite a beautiful JIT approach.
| pinaraf wrote:
| Same for me, that's why I did this after finding out this
| research paper. With the proper compiler settings and small
| tricks you can remove some parts and already end up faster
| than the interpreter (because you remove some branches and
| a few memory accesses) and it's even possible to create
| "super-stencils" covering typical opcodes series and
| optimizing them further. Or the opposite, "sub-stencils" in
| order to do some loop unrolling for instance.
| williamdclt wrote:
| (The article goes a bit above my head so my excuses if I am a
| bit off-topic)
|
| There is a form of query plan caching in PG: for prepared
| statements, if PG determines that the actual value of
| parameters won't affect the query plan much, it uses a "generic
| plan" so that it reuses the same query plan for every execution
| of the prepared statement
| (https://www.postgresql.org/docs/current/sql-prepare.html, see
| "notes")
| pinaraf wrote:
| Indeed, and right now it's the only possible way since it
| remains in a single session, doing otherwise would be very
| hard.
| cbsmith wrote:
| Unless you count stored procs...
| SigmundA wrote:
| Yes its manual and per session, DB's like MSSQL have that was
| well but are very rarely used anymore because it got
| automatic plan caching about 20 years ago which basically
| eliminates any advantage to manually preparing. Its actually
| better since it can be shared across all sessions
| aeyes wrote:
| The plan cache on Oracle in combination with prepared
| statements where the optimizer can't peek into the parameters
| has been really problematic for me in the past. I usually had
| to go in and either add hints or force a plan.
|
| Even simple queries like SELECT * FROM t WHERE x = TRUE; could
| turn into a nightmare depending on the distribution of the x
| values in the table.
|
| With Postgres I rarely encountered such problems but I must
| admit that I haven't used Postgres with prepared statements.
|
| I have seen some queries with slow planning time (>100ms) where
| a cache could have been useful but I don't remember ever really
| needing to optimize one.
| SigmundA wrote:
| Is x a parameter because it doesn't look like it? MSSQL has
| parameter sniffing and will make multiple plans based on
| incoming parameters I would be surprised if Oracle does not
| do the same. It can actually be problematic to sniff
| parameters sometimes and it can be disabled with a hint!
| aeyes wrote:
| I just made up something simple but yes, I had these
| problems with bind variables in prepared statements.
|
| > SELECT * FROM t WHERE x = :var;
|
| But I haven't used Oracle in years and back in the day,
| there was no bind variable peeking.
| weliveindetail wrote:
| > There is one way to make the LLVM JIT compiler more usable, but
| I fear it's going to take years to be implemented: being able to
| cache and reuse compiled queries.
|
| Actually, it's implemented in LLVM for years :)
| https://github.com/llvm/llvm-project/commit/a98546ebcd2a692e...
| pinaraf wrote:
| Yeah, well, sorry, I should have been more explicit here: the
| issue is with PostgreSQL, not LLVM. The JIT compiler has to
| inject direct memory addresses, making the generated code
| specific to your query and process.
| weliveindetail wrote:
| Interesting, because we store relocatable objects. And
| process symbols can be resolved by name if you really want.
| It might be yet another performance trade-off though.
| SigmundA wrote:
| Since PG uses one process per connection and the LLVM JIT code
| is process specific the code can't be shared amongst all
| connections to the DB.
|
| Plans themselves suffer from this since they are in memory data
| structures not designed to be shared amongst different
| processes.
|
| DB's like MSSQL don't have this issue since they use a single
| process with threads which is also why it can handle more
| concurrent connections without an external pooler. Although
| MSSQL can also serialize plans to a non process specific
| representation and store them in the DB for things like plan
| locking.
| hans_castorp wrote:
| > Plans themselves suffer from this since they are in memory
| data structures not designed to be shared amongst different
| processes.
|
| Oracle uses a process-per-connection model as well (at least
| on Linux), and they are able to share execution plans across
| connections. They put all the plans into the "global" shared
| memory.
| SigmundA wrote:
| Looks like you can change that with THREADED_EXECUTION to
| make it act like it does on Windows with a single process
| and threads:
|
| >On UNIX, starting with Oracle Database 12c Release 2
| (12.2), Oracle Database can use an operating system process
| or an operating system thread to implement each background
| task such as database writer (DBW0), log writer (LGWR),
| shared server process dispatchers, and shared servers.
|
| The use of operating system threads instead of processes
| allow resource sharing and reduce resource consumption.
|
| On Windows, each background process is implemented as a
| thread inside a single, large process.
|
| https://docs.oracle.com/en/database/oracle/oracle-
| database/1...
|
| Processes in Windows are much more expensive than Unix
| typically so using threads has always been preferred to
| multi process, perhaps thats why MSSQL only has that option
| with an almost fully recreated internal process model that
| you can list and kill etc.
|
| Even Oracle says it helps with resource usage, even on
| Unix/Linux. Also looks like Oracle has had some kind shared
| mode for a long time where it basically has a built in
| pooler to keep actual OS process count down, not 1:1 like
| PG.
|
| Sharing plans can obviously be done using shared memory but
| it's not a simple as just creating some C++ object model
| (which I believe is what PG has internally) for the plan it
| must have a process agnostic data format that is then
| executed probably by deserializing into a executable model
| from shared memory. Fully jitted code is even trickier vs
| just a set of logical plan operations. With threads you
| just share executable code.
| hinkley wrote:
| Way back on Oracle 9i, we had a mystery stall problem. We
| couldn't saturate the network, the CPU, or the fiber channel
| links. We were stuck at ~50% and stumped. Some fuckery was
| going on and we had to call in a professional.
|
| Turned out 9i could only run queries that currently resided
| in the query cache, and some idiot (who was now my boss) had
| fucked up our query builder code so that we were getting too
| many unique queries. Not enough bind variables.
|
| So it's clear Oracle was using a shared cache back then, but
| like other people here, I'm scratching my head how this would
| work with Postgres's flavor of MVCC. Maybe share query plans
| when the transaction completes?
|
| I feel like that would get you 90% of the way but with some
| head of queue nastiness.
| miohtama wrote:
| Back in 386 era, there was a concept of self-modifying code
| (assembly). A similar like stencils presented here, but because
| code was a singleton, rarely a copy was made.
|
| E.g. Doom on DOS used this optimisation techique, because
| otherwise you could not cram out enough performance from tight
| rendering loops on old CPUs.
___________________________________________________________________
(page generated 2024-03-18 23:00 UTC)