[HN Gopher] Graph DBMSs need new join algorithms: Story of worst...
___________________________________________________________________
Graph DBMSs need new join algorithms: Story of worst-case optimal
joins
Author : semihsalihoglu
Score : 61 points
Date : 2023-02-27 18:13 UTC (4 hours ago)
(HTM) web link (kuzudb.com)
(TXT) w3m dump (kuzudb.com)
| semihsalihoglu wrote:
| I wrote about a (relatively) new class of join algorithms for
| DBMSs that are called "worst-case optimal join algorithms". This
| is a topic that has kept me very busy for the last 5 6 years and
| the post aims to explain the simple ideas behind it and why it is
| the type of algorithms that graph DBMSs should integrate (and we
| have at KuzuDB: https://github.com/kuzudb/kuzu).
|
| If you're curious about how come database researchers/developers
| still work on joins after 50 years, it's because what is possible
| and what is not is actually not that well understood. After many
| decades wcoj algorithms was probably the biggest algorithmic leap
| in the field. Many people are looking into information theory and
| computational geometry to improve our understanding. There is
| even more advanced but currently unpractical algorithms called
| "beyond worst-case optimal" join algorithms, which I briefly
| leave pointers to at the end of the post.
|
| Enjoy reading!
| namibj wrote:
| Tetris doesn't seem impractical for analytical joins, though?
|
| Granted, it needs a good datastructure; I'm waiting for a
| friend to find time to discuss options there.
|
| I'm firmly of the belief that beyond-wcoj is a question of
| implementation, not of theoretical suitability in practical
| scenarios.
| semihsalihoglu wrote:
| If you have any good ideas about how to implement Tetris in a
| practical way, you can make a good impact here. This is the
| only work that I know that tried to implement these:
| https://arxiv.org/pdf/1503.04169.pdf (and excitingly
| published at GRADES workshop at SIGMOD, which I co-chaired
| twice and is very dear to my heart). But I talked to the
| authors and they all agree that despite the tone of the
| paper, they found them quite difficult to implement in a
| performant way.
|
| So here's the problem. The core algorithmic step of "beyond
| wcojs" are "geometric resolutions". The core idea is to work
| with gaps in the space. So for example suppose we are joining
| two relations R(A), S(A) which is an intersection and suppose
| A is an integer domain. Suppose further than R's maximum A
| value is 100, and S's minimum value is 101. Then there is a
| gap of (100, \infty) in R's space and another graph (-\infty,
| 101) in S's space. If you "resolve/join" these gaps, you get
| a gap of (-\infty, \infty), which tells you in one operation
| that the join's output is empty.
|
| On this simple query, this seems to work fine but if you have
| general relations (let alone non-integer data types) doing
| such geometric "resolutions" and finding efficient indices to
| index those "gaps" becomes quite challenging. But any good
| idea here will push the field!
| mamcx wrote:
| I wonder if exist a byte-sized implementation of this
| algorithms.? As far I know all is part of big system...
| semihsalihoglu wrote:
| Can you clarify what you mean by byte-sized implementations?
| There are several systems now that implement these
| algorithms, KuzuDB, Umbra, LogicBlox (the earliest was this)
| are examples I know. I'm sure more will come.
| genman wrote:
| Interesting. Thank you for sharing your work.
| semihsalihoglu wrote:
| You are welcome :)
| carterschonwald wrote:
| These are fun algorithms. I've found that some of my experiments
| in writing efficient linear algebra routines end up with exactly
| the same interfaces you'd want for implementing these algorithms
| semihsalihoglu wrote:
| Interesting. I think there are groups that does quite
| interesting work on linear algebra that intersects with wcoj
| algorithms: This keynote abstract should give some pointers on
| this group's work:
| https://www.vldb.org/pvldb/vol13/p3502-olteanu.pdf.
___________________________________________________________________
(page generated 2023-02-27 23:01 UTC)