[HN Gopher] The missing tier for query compilers
___________________________________________________________________
The missing tier for query compilers
Author : jamii
Score : 67 points
Date : 2025-02-10 03:36 UTC (2 days ago)
(HTM) web link (www.scattered-thoughts.net)
(TXT) w3m dump (www.scattered-thoughts.net)
| jamii wrote:
| The tradeoff in SingleStore is interesting. By default and unlike
| eg postgres, parameterized queries are planned ignoring the
| values of the parameters. This allows caching the compiled query
| but prevents adapting the query plan - for the example in the
| post SingleStore would pick one plan for both queries.
|
| But you can opt out of this behaviour by wrapping each parameter
| in NOPARAM (https://docs.singlestore.com/cloud/reference/sql-
| reference/c...).
| jsmith45 wrote:
| If it ignores the parameter values, then how does it estimate
| cardinality? Does it optimize for the worst case scenario
| (which runs the risk of choosing join implementations that
| requiring far more data from other tables than makes sense if
| the parameters are such that only a few rows are returned from
| the base table), or does it assume a row volume for a more
| average parameter, which risks long running time or timeouts if
| the parameter happens to be an outlier?
|
| Consider that Microsoft SQL Server for example, does cache
| query plans ignoring parameters for the purposes of caching
| (and will auto-parameterize queries by default), but SQL Server
| uses the parameter value to estimate cardinalities (and thus
| join types, etc) when first creating the plan, under the
| assumption that the first parameters it sees for this query are
| reasonably representative of future parameters.
|
| That approach can work, but if the first query it sees has
| outlier parameter values, it may cache a plan that is terrible
| for more common values, and users may have preferred the plan
| for the more typical values. Of course, it can be the reverse,
| where users want the plan that handles the outliers well,
| because that specific plan is still good enough with the more
| common values.
| convolvatron wrote:
| Clustrix wrote a compiler. It was able to call out to C for
| string intrinsics and communication and btree operations, so it
| primarily handled extracting data from serialized formats, simple
| arithmetic, constructing serializations from downstream.
|
| this was to replace a little byte code VM, and as far I recall
| took a very good programmer about 2 weeks for the basic version.
|
| I dont see any reason to even try to bring something like a
| general purpose JIT. there's a huge benefit from just doing the
| basics, and you can move the line between precompiled functions
| and dynamically compiled ones on an ongoing basis. its also cheap
| enough that I wouldn't get hung up on needing to amortize that
| cost with prepared statements.
| foobazgt wrote:
| The author pans meta-tracing and shows a graph of TruffleRuby
| having really crazy runtime behavior. However, it looks extremely
| suspicious - like the kind of thing you'd see if there was a
| memory leak and the garbage collector was running constantly.
|
| Most people seem really excited about the results coming out of
| Graal and TruffleRuby seems to have some really impressive
| results in general, so the graph is surprising. It's also missing
| a bunch of details, so it's hard to speak to its voracity - for
| example, what are the different versions of the runtimes, what
| flags were used, on what hardware?
|
| As a counter example, there's a different (admittedly old) result
| where TruffleRuby beats cruby on the same benchmark by 4.38x.
| https://eregon.me/blog/2022/01/06/benchmarking-cruby-mjit-yj...
| zX41ZdbW wrote:
| This is an interesting article!
|
| ClickHouse does a hybrid approach. It uses a vectorized
| interpreter plus the JIT compilation (with LLVM) of small
| expressions (which is also vectorized and generated for the
| target instruction set by compiling loops). The JIT compilation
| does not bother to compile the whole query, only simple
| fragments. It gives only a marginal improvement over the
| vectorized interpreter.
|
| Even with this approach, JIT is a huge pile of problems. A few
| details here: https://clickhouse.com/blog/clickhouse-just-in-
| time-compiler...
|
| Overall architecture overview: https://clickhouse.com/blog/first-
| clickhouse-research-paper-...
| t0b1 wrote:
| The author striked out the part about CedarDB not being available
| -- which is true -- but Umbra is available as a docker
| container[1] for some time now. The "Umbra paper" linked also
| contains an artifact[2] with a copy of Umbra as well as some
| instructions on how to control the back-ends used etc. (Cranelift
| is not available as an option in the docker versions however)
|
| I kind of disagree with the assumption that baseline compilers
| are easy to build (depending on your definition of baseline). A
| back-end like DirectEmit is not easy to write and even harder to
| verify. If you have a back-end written for your own IR you will
| likely have to write tests in that IR and it will probably be
| quite hard to simply port over codegen (or run-) tests from other
| compilers. Especially in the context of databases it is not very
| reassuring if you have a back-end that may explode the second you
| start to generate code slightly differently. We're working on
| making this a bit more commoditized but in our opinion you will
| always need to do some work since having another IR (with defined
| semantics someone could write a code generator for you) for a
| back-end is very expensive. In Umbra, translating Umbra-IR to
| LLVM-IR takes more time than compiling to machine code with
| DirectEmit.
|
| Also, if it is easy to write them, I would expect to see more
| people write them.
|
| Copy-and-patch was also tried in the context of MLIR[3] but the
| exec-time results were not that convincing and I have been told
| that it is unlikely for register allocation to work sufficiently
| well to make a difference.
|
| [1]: https://hub.docker.com/r/umbradb/umbra
|
| [2]: https://zenodo.org/records/10357363
|
| [3]: https://home.cit.tum.de/~engelke/pubs/2403-cc.pdf
| UncleEntity wrote:
| > We're working on making this a bit more commoditized but in
| our opinion you will always need to do some work since having
| another IR (with defined semantics someone could write a code
| generator for you) for a back-end is very expensive.
|
| I've been nipping at the edges of adapting the vmIDL from vmgen
| and using that to generate the machinery to do some jitting.
| But I'm slow and lazy...
|
| The general idea is to have the hypothetical user define the
| operands of their IR along with the code snippets and use this
| to stitch together a jit compiler library. Or perhaps a wrapper
| around an existing jit library, dunno. Either way it gives me
| some yaks to shave which makes me happy.
| UncleEntity wrote:
| The real original copy and patch paper [0] used gcc's addressable
| gotos to get the addresses of the "patches" and pointers for the
| operands which can be replaced at jit-time as opposed to the
| referenced paper which uses llmv tomfoolery to create a database
| of code patches as part of the building process.
|
| IIRC, there may be another "original" paper.
|
| To help with the author's confusion, they use function arguments
| to manage the registers. If you have a simple operand like 'add'
| you pass the left and right arguments as function arguments but
| if you have some values you want to keep in registers you pass
| those as additional arguments to the functions and just pass them
| through to the continuation hoping the compiler does the right
| thing (i.e. not spill the arguments during the function call).
|
| Of course this leads to a combinatorial explosion of functions as
| you need one specialized for every register you want to pass
| through: void add0(int l, int r, kont c);
| void add1(int l, int r, int r0, kont c); void add2(int l,
| int r, int r0, int r1, kont c);
|
| and also need to track in your jitifier who's doing what and how
| many intermediate results are needed at which program point &etc.
|
| I believe both ways share the same method of generating a shitton
| of functions as a linked library and using that to create a
| database of code patches at compile time which the runtime uses
| to patch it all together. One does it using llvm tooling and the
| other uses 'extern int _add_l,_ add_r, *add_r0' to get the
| locations of the operands to replace at runtime with the the
| actual value because C doesn't care one bit what you do with
| pointers.
|
| I'm probably wrong on some of the details but that's the basics
| of what's happening.
|
| __edit__ Reading the part on how extern variables are used and
| the example function definitions doesn't match perfectly with
| what is actually happening but the concept is the same -- use the
| locations the pointers point to to find the places to replace
| with actual values at runtime. The function defs are more like
| what a 'musttail interpreter' would use. Not sure this
| 'correction' is any clearer...
|
| [0] http://ieeexplore.ieee.org/document/1342540/
___________________________________________________________________
(page generated 2025-02-12 23:01 UTC)