[HN Gopher] How do databases execute expressions?
___________________________________________________________________
How do databases execute expressions?
Author : todsacerdoti
Score : 32 points
Date : 2023-09-21 19:24 UTC (3 hours ago)
(HTM) web link (notes.eatonphil.com)
(TXT) w3m dump (notes.eatonphil.com)
| convolvatron wrote:
| idk why you would use a virtual machine. clustrix did for a while
| until the instruction dispatch loop dominated profiling. then one
| particularly clever and energetic engineer just lowered onto x86
| transactional wrote:
| A recent blog post about MongoDB's SBE detailed that the
| purpose behind their VM is that it serves as a way to re-use an
| execution component between two different query languages.
| SQLite's claim was just that the strong separation between
| compilation and execution makes issues easier to debug.
|
| I wouldn't expect VMs to become the default design in
| databases, but it seems like it's getting increasingly common
| as an IR for query compilation. The ability to have a
| (comparatively simpler) interpreter for the VM also means you
| can apply simple fuzzing to great effect: if the results of
| interpretation vs compilation ever diverge, there's a bug.
| anarazel wrote:
| Another related aspect is that the VM approach can allow
| JITing only some of the opcodes, which makes it more feasible
| and maintainable.
| anarazel wrote:
| I did the switch from a tree walk interpreter to a VM in
| postgres - it did lead to substantial speedups.
|
| I've experimented with lowering into x86 "manually" as well,
| and it did provide gains, but the portability aspects are
| concerning. And for much of the desirable speedups you ime want
| more optimization than trivial lowering would give you.
|
| Either way, without the tree walk -> VM move, it'd have been
| much harder to lower into x86 directly. And you'd still need a
| fallback for other platforms.
| cmrdporcupine wrote:
| It would surprise me to see the dispatch loop dominate in a DB?
| given that on the whole databases tend to block a lot on more
| expensive things like materialization, joins, etc?
|
| I do think that a VM could mess up CPU branch prediction so I
| can see that as being a problem?
|
| Some people have gone the route of emitting LLVM IR (or
| similar) for query plans, but this has a cost, too, so only
| makes sense for repeatedly used queries.
| anarazel wrote:
| It really depends on the query. If there's a bunch of math in
| the select list, it's easy for that to be the biggest
| bottleneck, even if there are joins etc. Also you'll
| sometimes have to evaluate expressions to check join
| conditions depending on the join implementation and
| condition.
|
| > I do think that a VM could mess up CPU branch prediction so
| I can see that as being a problem
|
| It definitely happens. It can be partially addressed with
| things like computed gotos, but it's far from perfect. At
| least for postgres, it's just that the tree walk overhead was
| far higher.
|
| The branch prediction handling for opcode dispatch loops
| definitely improved in CPUs - for Intel somewhere around
| Haswell (it's been a while, so this is a guess). I remember
| that, at the time, the gains of using computed goto on my
| aging workstation (nehalem) were far bigger than a few uarchs
| later, even when controlling for cache size etc.
|
| Semi related anecdote: Both opcode dispatch switches and
| computed goto based dispatch are good for stressing
| compilers. I found newly introduced >= quadratic behavior in
| all the major c compilers in the years since.
| convolvatron wrote:
| clearly not in I/O bound cases, but in that world everything
| except the b-tree was done relationally, so yeah joins and
| view and the whole lot. we may have also had a really poorly
| though out VM. but direct instruction generation wasn't
| really a big deal and it helped quite a bit
| adam_oxla wrote:
| Amazing overview. Right now I work on a completely new OLAP
| database: vectorized query execution is must have nowadays if you
| want to have competitive performance. What's interesting is that
| vectorized query engines benefits a lot from SIMD instructions
| (that is obvious). But what is less obvious is that rising
| popularity of ARM CPUs made it more complex to developer
| vectorized query engine because one has to provide support for
| both architectures.
| gavinray wrote:
| Related: _" Everything You Always Wanted to Know About Compiled
| and Vectorized Queries But Were Afraid to Ask"_
|
| https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf
| therein wrote:
| InfluxDb is going back to InfluxQL? Interesting.
___________________________________________________________________
(page generated 2023-09-21 23:01 UTC)