[HN Gopher] The missing tier for query compilers
       ___________________________________________________________________
        
       The missing tier for query compilers
        
       Author : jamii
       Score  : 67 points
       Date   : 2025-02-10 03:36 UTC (2 days ago)
        
 (HTM) web link (www.scattered-thoughts.net)
 (TXT) w3m dump (www.scattered-thoughts.net)
        
       | jamii wrote:
       | The tradeoff in SingleStore is interesting. By default and unlike
       | eg postgres, parameterized queries are planned ignoring the
       | values of the parameters. This allows caching the compiled query
       | but prevents adapting the query plan - for the example in the
       | post SingleStore would pick one plan for both queries.
       | 
       | But you can opt out of this behaviour by wrapping each parameter
       | in NOPARAM (https://docs.singlestore.com/cloud/reference/sql-
       | reference/c...).
        
         | jsmith45 wrote:
         | If it ignores the parameter values, then how does it estimate
         | cardinality? Does it optimize for the worst case scenario
         | (which runs the risk of choosing join implementations that
         | requiring far more data from other tables than makes sense if
         | the parameters are such that only a few rows are returned from
         | the base table), or does it assume a row volume for a more
         | average parameter, which risks long running time or timeouts if
         | the parameter happens to be an outlier?
         | 
         | Consider that Microsoft SQL Server for example, does cache
         | query plans ignoring parameters for the purposes of caching
         | (and will auto-parameterize queries by default), but SQL Server
         | uses the parameter value to estimate cardinalities (and thus
         | join types, etc) when first creating the plan, under the
         | assumption that the first parameters it sees for this query are
         | reasonably representative of future parameters.
         | 
         | That approach can work, but if the first query it sees has
         | outlier parameter values, it may cache a plan that is terrible
         | for more common values, and users may have preferred the plan
         | for the more typical values. Of course, it can be the reverse,
         | where users want the plan that handles the outliers well,
         | because that specific plan is still good enough with the more
         | common values.
        
       | convolvatron wrote:
       | Clustrix wrote a compiler. It was able to call out to C for
       | string intrinsics and communication and btree operations, so it
       | primarily handled extracting data from serialized formats, simple
       | arithmetic, constructing serializations from downstream.
       | 
       | this was to replace a little byte code VM, and as far I recall
       | took a very good programmer about 2 weeks for the basic version.
       | 
       | I dont see any reason to even try to bring something like a
       | general purpose JIT. there's a huge benefit from just doing the
       | basics, and you can move the line between precompiled functions
       | and dynamically compiled ones on an ongoing basis. its also cheap
       | enough that I wouldn't get hung up on needing to amortize that
       | cost with prepared statements.
        
       | foobazgt wrote:
       | The author pans meta-tracing and shows a graph of TruffleRuby
       | having really crazy runtime behavior. However, it looks extremely
       | suspicious - like the kind of thing you'd see if there was a
       | memory leak and the garbage collector was running constantly.
       | 
       | Most people seem really excited about the results coming out of
       | Graal and TruffleRuby seems to have some really impressive
       | results in general, so the graph is surprising. It's also missing
       | a bunch of details, so it's hard to speak to its voracity - for
       | example, what are the different versions of the runtimes, what
       | flags were used, on what hardware?
       | 
       | As a counter example, there's a different (admittedly old) result
       | where TruffleRuby beats cruby on the same benchmark by 4.38x.
       | https://eregon.me/blog/2022/01/06/benchmarking-cruby-mjit-yj...
        
       | zX41ZdbW wrote:
       | This is an interesting article!
       | 
       | ClickHouse does a hybrid approach. It uses a vectorized
       | interpreter plus the JIT compilation (with LLVM) of small
       | expressions (which is also vectorized and generated for the
       | target instruction set by compiling loops). The JIT compilation
       | does not bother to compile the whole query, only simple
       | fragments. It gives only a marginal improvement over the
       | vectorized interpreter.
       | 
       | Even with this approach, JIT is a huge pile of problems. A few
       | details here: https://clickhouse.com/blog/clickhouse-just-in-
       | time-compiler...
       | 
       | Overall architecture overview: https://clickhouse.com/blog/first-
       | clickhouse-research-paper-...
        
       | t0b1 wrote:
       | The author striked out the part about CedarDB not being available
       | -- which is true -- but Umbra is available as a docker
       | container[1] for some time now. The "Umbra paper" linked also
       | contains an artifact[2] with a copy of Umbra as well as some
       | instructions on how to control the back-ends used etc. (Cranelift
       | is not available as an option in the docker versions however)
       | 
       | I kind of disagree with the assumption that baseline compilers
       | are easy to build (depending on your definition of baseline). A
       | back-end like DirectEmit is not easy to write and even harder to
       | verify. If you have a back-end written for your own IR you will
       | likely have to write tests in that IR and it will probably be
       | quite hard to simply port over codegen (or run-) tests from other
       | compilers. Especially in the context of databases it is not very
       | reassuring if you have a back-end that may explode the second you
       | start to generate code slightly differently. We're working on
       | making this a bit more commoditized but in our opinion you will
       | always need to do some work since having another IR (with defined
       | semantics someone could write a code generator for you) for a
       | back-end is very expensive. In Umbra, translating Umbra-IR to
       | LLVM-IR takes more time than compiling to machine code with
       | DirectEmit.
       | 
       | Also, if it is easy to write them, I would expect to see more
       | people write them.
       | 
       | Copy-and-patch was also tried in the context of MLIR[3] but the
       | exec-time results were not that convincing and I have been told
       | that it is unlikely for register allocation to work sufficiently
       | well to make a difference.
       | 
       | [1]: https://hub.docker.com/r/umbradb/umbra
       | 
       | [2]: https://zenodo.org/records/10357363
       | 
       | [3]: https://home.cit.tum.de/~engelke/pubs/2403-cc.pdf
        
         | UncleEntity wrote:
         | > We're working on making this a bit more commoditized but in
         | our opinion you will always need to do some work since having
         | another IR (with defined semantics someone could write a code
         | generator for you) for a back-end is very expensive.
         | 
         | I've been nipping at the edges of adapting the vmIDL from vmgen
         | and using that to generate the machinery to do some jitting.
         | But I'm slow and lazy...
         | 
         | The general idea is to have the hypothetical user define the
         | operands of their IR along with the code snippets and use this
         | to stitch together a jit compiler library. Or perhaps a wrapper
         | around an existing jit library, dunno. Either way it gives me
         | some yaks to shave which makes me happy.
        
       | UncleEntity wrote:
       | The real original copy and patch paper [0] used gcc's addressable
       | gotos to get the addresses of the "patches" and pointers for the
       | operands which can be replaced at jit-time as opposed to the
       | referenced paper which uses llmv tomfoolery to create a database
       | of code patches as part of the building process.
       | 
       | IIRC, there may be another "original" paper.
       | 
       | To help with the author's confusion, they use function arguments
       | to manage the registers. If you have a simple operand like 'add'
       | you pass the left and right arguments as function arguments but
       | if you have some values you want to keep in registers you pass
       | those as additional arguments to the functions and just pass them
       | through to the continuation hoping the compiler does the right
       | thing (i.e. not spill the arguments during the function call).
       | 
       | Of course this leads to a combinatorial explosion of functions as
       | you need one specialized for every register you want to pass
       | through:                 void add0(int l, int r, kont c);
       | void add1(int l, int r, int r0, kont c);       void add2(int l,
       | int r, int r0, int r1, kont c);
       | 
       | and also need to track in your jitifier who's doing what and how
       | many intermediate results are needed at which program point &etc.
       | 
       | I believe both ways share the same method of generating a shitton
       | of functions as a linked library and using that to create a
       | database of code patches at compile time which the runtime uses
       | to patch it all together. One does it using llvm tooling and the
       | other uses 'extern int _add_l,_ add_r, *add_r0' to get the
       | locations of the operands to replace at runtime with the the
       | actual value because C doesn't care one bit what you do with
       | pointers.
       | 
       | I'm probably wrong on some of the details but that's the basics
       | of what's happening.
       | 
       | __edit__ Reading the part on how extern variables are used and
       | the example function definitions doesn't match perfectly with
       | what is actually happening but the concept is the same -- use the
       | locations the pointers point to to find the places to replace
       | with actual values at runtime. The function defs are more like
       | what a 'musttail interpreter' would use. Not sure this
       | 'correction' is any clearer...
       | 
       | [0] http://ieeexplore.ieee.org/document/1342540/
        
       ___________________________________________________________________
       (page generated 2025-02-12 23:01 UTC)