[HN Gopher] Why query planning for streaming systems is hard
___________________________________________________________________
Why query planning for streaming systems is hard
Author : simonz05
Score : 61 points
Date : 2021-08-15 18:44 UTC (4 hours ago)
(HTM) web link (scattered-thoughts.net)
(TXT) w3m dump (scattered-thoughts.net)
| eventreduce wrote:
| You should check out the event-reduce algorithm[1]. It scales in
| a different way on how it calculates new results based on old-
| Results+Event. So it can have some benefits over materialized
| views depending on the data size and how many clients subscribe
| to a query.
|
| [1] https://github.com/pubkey/event-reduce
| SpicyLemonZest wrote:
| Robust operators in particular seem like an important area of
| research to me. I've never had time to sit down and think about
| it in detail, but I can't count the number of escalations I've
| had that would have been instantly solved if only we had a
| gracefully degrading broadcast join.
| mathgladiator wrote:
| Underneath this is the hard truth that streaming is hard and
| costly. I've run into many people with optimistic dreams around
| how to take an SQL or other query languages and then make it
| update based on data changes rather than periodic polling.
|
| These systems are always expensive and yield marginal
| improvements when compared to a dumb publish subscribe system.
| I've got notes on publish subscribe systems, and I am publishing
| a paper next month about the end result system that I built over
| five years. For a clue about the midpoint, my SRECON talk
| addresses this:
| https://www.usenix.org/conference/srecon17americas/program/p...
|
| A key note is that publish/subscribe over-commits the problem
| space (i.e. the problem space can potentially be quadratic
| depending on use), but if you give up on durability then your
| costs can go way-way-way down.
|
| The lack of durability is a problem, BUT it can be leveraged to
| discover non-marginal use-cases such that further investments
| make sense. Unfortunately, once you invest in the imperfect
| system such that it operates at five+ nines during normal
| operation, then very few query based systems are competitive.
| gopalv wrote:
| > If we build plans from operators that work well across large
| ranges of input distributions and that don't have catastrophic
| failure modes, then we won't be as reliant on accurate
| statistics.
|
| (TL;DR - there's never enough statistics, except what you collect
| when running that query)
|
| The problem with a lot of these approaches is that they suck at
| the small scale - the sorted merge-join is very safe across
| different data-sizes, but a hash-join will go through it so fast,
| it wouldn't make a lot of sense to always opt for it.
|
| When I started working on Hive, it was entirely full of "query
| won't fail, so what if it is slow" joins (or conditional tasks,
| which are "try to build a hash, if it fails, fail over to a sort-
| merge" encoded into the plan - that's great when it rarely fails
| - the config option still bears that scar in
| hive.auto.convert.join.noconditionaltask=true). There was another
| pass at this with the Grace hash-join[1] (+ a bloom filter while
| spilling) to make the hash join safe against overfilling.
|
| The problem was that most situations demanded things the other
| way around, where the choice of tech depends on a smaller
| prototype and how fast it goes, with the assumption that larger
| scales would simply be throwing more hardware at it. So people
| who tried the product thought it was slow and would scale upwards
| linearly with scale (not that it would be the same latency from
| 1Gb all the way to 100), so it wouldn't "sell" in the market.
|
| Eventually, we do the least optimal way of doing this & just
| record statistics on a query operators through runtime counters +
| if it fails, just reuse the row statistics collected at runtime
| back into the plan to re-execute the query with a plan which
| includes the updated statistics & take another go at it.
|
| This worked better for filter selectivity, particularly where we
| got NOT IN ('FAILED', 'ABANDONDED') sort of estimates very wrong
| (a histogram would help, but an HLL+ is useless).
|
| Even though this puts a huge penalty on a bad plan, the bad plans
| are not common enough for the always-robust sort of mechanism to
| be needed - so by sacrificing the P99 at scale, you get your P95
| latencies way down.
|
| And overall, being optimistic has worked out better for the whole
| cluster utilization, because there's a small fraction of giant
| queries & everyone else with Tableau is happy that everything
| returns fast.
|
| [1] - https://en.wikipedia.org/wiki/Hash_join#Grace_hash_join
___________________________________________________________________
(page generated 2021-08-15 23:00 UTC)