[HN Gopher] Query serving systems: An emerging category of data ...
       ___________________________________________________________________
        
       Query serving systems: An emerging category of data systems
        
       Author : KraftyOne
       Score  : 40 points
       Date   : 2022-05-10 17:14 UTC (3 days ago)
        
 (HTM) web link (petereliaskraft.net)
 (TXT) w3m dump (petereliaskraft.net)
        
       | FridgeSeal wrote:
       | These > OLAP systems like Druid and Clickhouse
       | 
       | And these
       | 
       | > data warehouses like Snowflake and Redshift
       | 
       | Are fundamentally the same, and I'm yet to see any reason other
       | than "marketing shenanigans" and "avoiding benchmarks" as to why
       | they should be given their own special category. Call them all
       | modern olap, or call them all data warehouses, doesn't matter.
       | 
       | > general-purpose data placement algorithm for query serving
       | systems that improves latency by maximizing query parallelism,
       | spreading out shards that are frequently queried together.
       | 
       | This is cool, it will be interesting to know if the added
       | parallelism wins over network overhead and added coordination
       | required. Maybe there's ways to shift where that line lies as
       | well?
        
         | richieartoul wrote:
         | I would say one major difference between those two categories
         | of systems is that Druid/Clickhouse are designed to be deployed
         | in "user-facing" settings directly where you can put queries to
         | them in the critical path of your app, whereas I've never head
         | of anyone doing that for Snowflake/Redshift. I'm sure you
         | could, but I bet the cost would be prohibitive and I'm not sure
         | how well they'd handle the concurrency without a lot of safe
         | guards in your application.
        
           | FridgeSeal wrote:
           | I've been on a project where we _experimented_ putting
           | Snowflake on a user-facing path. It was expensive and
           | ineffectual.
           | 
           | Given that the likes of ClickHouse and Druid can be made user
           | facing, _and_ support backend analytics workloads, doesn't
           | that just imply that Snowflake /redshift are just outright
           | less capable?
        
             | AdamProut wrote:
             | SingleStoreDB is heavily used for this type of app. We used
             | to call this use case real-time analytics (though it has
             | many other names today)
             | 
             | [1] https://www.singlestore.com/blog/the-technical-
             | capabilities-...
             | 
             | (Disclosure: SingleStoreDB cofounder)
        
             | richieartoul wrote:
             | Not really. Clickhouse is amazing, but if you want to run
             | it at massive scale you'll have to invest a lot into
             | sharding and clustering and all that. Druid is more
             | distributed by default, but doesn't support as
             | sophisticated of queries as Clickhouse does.
             | 
             | Neither Clickhouse nor Druid can hold a candle to what
             | Snowflake can do in terms of query capabilities, as well as
             | the flexibility and richness of their product.
             | 
             | That's just scratching the surface. They're completely
             | different product categories IMO, although they have a lot
             | of technical / architectural overlap depending on how much
             | you squint.
             | 
             | Devil is in the details basically.
        
               | FridgeSeal wrote:
               | > Neither Clickhouse nor Druid can hold a candle to what
               | Snowflake can do in terms of query capabilities, as well
               | as the flexibility and richness of their product.
               | 
               | Do you have something specific in mind?
               | 
               | My previous experience with Snowflake was the query
               | functionality was lacking, performance was subpar (at
               | best), and half the purported features were a joke
               | (looking at you "Kafka integration") or just gimmicky
               | (the time travel feature)
        
               | AdamProut wrote:
               | Clickhouse and Druid are not very good at complex OLAP
               | queries. Clickhouse is pretty upfront about needing to
               | denormalize your schema To avoid distributed joins.
               | Neither are anywhere close to the performance of top DWs
               | on analytical benchmarks like TPC-H or TPC-DS
        
       | rmbyrro wrote:
       | I don't see the purpose.
       | 
       | We can always group stuff in a higher level category.
       | 
       | Theres no difference between backend, frontend, gaming, embedded,
       | etc, essentially they're all _bit manipulators_.
       | 
       | But... What's the purpose here?
        
         | [deleted]
        
       | latenightcoding wrote:
       | CockroachDB is definitely not the first db that comes to mind
       | when I thik OLTP.
        
       | richieartoul wrote:
       | The referenced paper on uniserve:
       | https://petereliaskraft.net/res/uniserve.pdf is interesting, but
       | seems to focus on systems where storage and compute are
       | colocated, but it doesn't discuss (or maybe I skimmed too
       | quickly) more modern architectures where compute and storage are
       | separated (usually with a caching layer built into the compute
       | nodes). In those architectures, most concerns about shifting data
       | around at query time are moot.
       | 
       | Also in my experience building the scatter-gather query
       | functionality and re-aggregation is usually the easiest part. The
       | hard part is figuring out how to build fair multi-tenancy and QoS
       | into what is essentially a massively parallel user facing real-
       | time data lake.
        
         | KraftyOne wrote:
         | That's a great point, and I definitely agree that supporting
         | disaggregated architectures is important and a potential next
         | step for the project. It raises new challenges--systems like
         | Snowflake need to know a lot about how data is represented on
         | disk in order to efficiently move it around--but it ought to be
         | possible to define new abstractions for those representations
         | (or reuse existing ones) in a way that cuts across a lot of
         | systems.
        
       | hestefisk wrote:
       | How is it different to OLAP? It's exactly what a data mart does.
        
       ___________________________________________________________________
       (page generated 2022-05-13 23:01 UTC)