[HN Gopher] Query serving systems: An emerging category of data ...
___________________________________________________________________
Query serving systems: An emerging category of data systems
Author : KraftyOne
Score : 40 points
Date : 2022-05-10 17:14 UTC (3 days ago)
(HTM) web link (petereliaskraft.net)
(TXT) w3m dump (petereliaskraft.net)
| FridgeSeal wrote:
| These > OLAP systems like Druid and Clickhouse
|
| And these
|
| > data warehouses like Snowflake and Redshift
|
| Are fundamentally the same, and I'm yet to see any reason other
| than "marketing shenanigans" and "avoiding benchmarks" as to why
| they should be given their own special category. Call them all
| modern olap, or call them all data warehouses, doesn't matter.
|
| > general-purpose data placement algorithm for query serving
| systems that improves latency by maximizing query parallelism,
| spreading out shards that are frequently queried together.
|
| This is cool, it will be interesting to know if the added
| parallelism wins over network overhead and added coordination
| required. Maybe there's ways to shift where that line lies as
| well?
| richieartoul wrote:
| I would say one major difference between those two categories
| of systems is that Druid/Clickhouse are designed to be deployed
| in "user-facing" settings directly where you can put queries to
| them in the critical path of your app, whereas I've never head
| of anyone doing that for Snowflake/Redshift. I'm sure you
| could, but I bet the cost would be prohibitive and I'm not sure
| how well they'd handle the concurrency without a lot of safe
| guards in your application.
| FridgeSeal wrote:
| I've been on a project where we _experimented_ putting
| Snowflake on a user-facing path. It was expensive and
| ineffectual.
|
| Given that the likes of ClickHouse and Druid can be made user
| facing, _and_ support backend analytics workloads, doesn't
| that just imply that Snowflake /redshift are just outright
| less capable?
| AdamProut wrote:
| SingleStoreDB is heavily used for this type of app. We used
| to call this use case real-time analytics (though it has
| many other names today)
|
| [1] https://www.singlestore.com/blog/the-technical-
| capabilities-...
|
| (Disclosure: SingleStoreDB cofounder)
| richieartoul wrote:
| Not really. Clickhouse is amazing, but if you want to run
| it at massive scale you'll have to invest a lot into
| sharding and clustering and all that. Druid is more
| distributed by default, but doesn't support as
| sophisticated of queries as Clickhouse does.
|
| Neither Clickhouse nor Druid can hold a candle to what
| Snowflake can do in terms of query capabilities, as well as
| the flexibility and richness of their product.
|
| That's just scratching the surface. They're completely
| different product categories IMO, although they have a lot
| of technical / architectural overlap depending on how much
| you squint.
|
| Devil is in the details basically.
| FridgeSeal wrote:
| > Neither Clickhouse nor Druid can hold a candle to what
| Snowflake can do in terms of query capabilities, as well
| as the flexibility and richness of their product.
|
| Do you have something specific in mind?
|
| My previous experience with Snowflake was the query
| functionality was lacking, performance was subpar (at
| best), and half the purported features were a joke
| (looking at you "Kafka integration") or just gimmicky
| (the time travel feature)
| AdamProut wrote:
| Clickhouse and Druid are not very good at complex OLAP
| queries. Clickhouse is pretty upfront about needing to
| denormalize your schema To avoid distributed joins.
| Neither are anywhere close to the performance of top DWs
| on analytical benchmarks like TPC-H or TPC-DS
| rmbyrro wrote:
| I don't see the purpose.
|
| We can always group stuff in a higher level category.
|
| Theres no difference between backend, frontend, gaming, embedded,
| etc, essentially they're all _bit manipulators_.
|
| But... What's the purpose here?
| [deleted]
| latenightcoding wrote:
| CockroachDB is definitely not the first db that comes to mind
| when I thik OLTP.
| richieartoul wrote:
| The referenced paper on uniserve:
| https://petereliaskraft.net/res/uniserve.pdf is interesting, but
| seems to focus on systems where storage and compute are
| colocated, but it doesn't discuss (or maybe I skimmed too
| quickly) more modern architectures where compute and storage are
| separated (usually with a caching layer built into the compute
| nodes). In those architectures, most concerns about shifting data
| around at query time are moot.
|
| Also in my experience building the scatter-gather query
| functionality and re-aggregation is usually the easiest part. The
| hard part is figuring out how to build fair multi-tenancy and QoS
| into what is essentially a massively parallel user facing real-
| time data lake.
| KraftyOne wrote:
| That's a great point, and I definitely agree that supporting
| disaggregated architectures is important and a potential next
| step for the project. It raises new challenges--systems like
| Snowflake need to know a lot about how data is represented on
| disk in order to efficiently move it around--but it ought to be
| possible to define new abstractions for those representations
| (or reuse existing ones) in a way that cuts across a lot of
| systems.
| hestefisk wrote:
| How is it different to OLAP? It's exactly what a data mart does.
___________________________________________________________________
(page generated 2022-05-13 23:01 UTC)