[HN Gopher] What's New in ClickHouse 21.12
       ___________________________________________________________________
        
       What's New in ClickHouse 21.12
        
       Author : nnx
       Score  : 111 points
       Date   : 2021-12-16 13:17 UTC (9 hours ago)
        
 (HTM) web link (clickhouse.com)
 (TXT) w3m dump (clickhouse.com)
        
       | tonyhb wrote:
       | I can't speak highly enough of Clickhouse. Since 20.x, which was
       | already one of (the?) fastest column store databases around, it's
       | gotten even better:
       | 
       | - Built-in replication without the need to run Zookeeper, with
       | multi-tier sharding support. They rewrite the ZK protocol using
       | multi-paxos within CH itself. It's great.
       | 
       | - Built-in batch inserts via the HTTP protocol... sorry, TCP :(.
       | Previously you'd have to batch using buffer tables, proxies, or
       | in-memory buffering within your client apps. This is no longer
       | needed!
       | 
       | - Better support for external data formats (avro, parquet)
       | 
       | It's just... so good.
        
         | MrBuddyCasino wrote:
         | > Built-in batch inserts via the HTTP protocol
         | 
         | Very welcome. We used to do that with a dedicated app.
         | 
         | > It's just... so good.
         | 
         | What Postgres is for RDBMs and SqlLite for embedded, ClickHouse
         | is for time series. Tastefully designed and driven by
         | engineering excellence. I wish them all the best.
        
         | dunkelheit wrote:
         | > Built-in replication without the need to run Zookeeper, with
         | multi-tier sharding support. They rewrite the ZK protocol using
         | multi-paxos within CH itself. It's great.
         | 
         | AFAIK this description is kind of misleading. When they say
         | that they got rid of Zookeeper people expect that they can just
         | connect clickhouse nodes to each other and the replication will
         | work. But that is not how things work - you still have to run
         | external service called clickhouse-keeper. Basically what they
         | did is they rewrote Zookeeper in C++.
        
           | mr-karan wrote:
           | Fwiw, Clickhouse Keeper _can_ be run as an external daemon if
           | you wish so. But it's packaged with the server binary itself,
           | so once you add <keeper_config> to all your server nodes,
           | you're good to go, without running anything else.
        
           | tonyhb wrote:
           | You can run clickhouse-keeper embedded in the server, though.
           | That way, each "primary" handles incoming SQL connections
           | _and_ paxos communication without additional infra.
        
         | karterk wrote:
         | Do you have to do anything special to opt-in for the built-in
         | batch inserts? Earlier, I was forced to use the buffer tables
         | approach: how would one ditch that now?
        
           | tonyhb wrote:
           | If you're using the HTTP protocol, add `async_insert=1` to
           | your connection string. You can tune the batching here: https
           | ://clickhouse.com/docs/en/operations/settings/settings/...
        
       | nojito wrote:
       | Coolest part of clickhouse is it's ability to do ETL
       | automagically.
       | 
       | It really is a super power.
        
         | polskibus wrote:
         | What do you mean? Does it have an etl engine like MS SSIS or
         | scheduler like airflow built-in?
        
           | nojito wrote:
           | I essentially load in two columns one called timestamp and
           | another with a json blob.
           | 
           | I then use this to kick off materialized views to
           | automagically pluck out relevant JSON fields into views
           | 
           | Similar to this
           | 
           | https://eng.uber.com/logging/
        
             | mr-karan wrote:
             | That's an interesting way to load data and then use with
             | Materialized Views. However, I am curious how do you make
             | efficient use of compression codecs[0] that Clickhouse
             | provides, or some neat features like TTL policies [1] using
             | this method?
             | 
             | [0]: https://clickhouse.com/docs/en/sql-reference/data-
             | types/lowc...
             | 
             | [1]: https://clickhouse.com/docs/en/sql-
             | reference/statements/alte...
        
               | Redsquare wrote:
               | Materialized views have a backing table where you can use
               | the codecs. Can add ttl the ingestion table
        
             | flurly wrote:
             | Great minds think alike :)
             | 
             | We do the exact same thing at GraphJSON
             | https://www.graphjson.com/guides/about
        
       | polskibus wrote:
       | Does anyone know if clickhouse keeper replace Zookeeper in non
       | clickhouse scenarios?
        
         | e12e wrote:
         | According to:
         | https://clickhouse.com/docs/en/operations/clickhouse-keeper/
         | 
         | > By default, ClickHouse Keeper provides the same guarantees as
         | ZooKeeper (linearizable writes, non-linearizable reads). It has
         | a compatible client-server protocol, so any standard ZooKeeper
         | client can be used to interact with ClickHouse Keeper.
         | Snapshots and logs have an incompatible format with ZooKeeper,
         | but clickhouse-keeper-converter tool allows to convert
         | ZooKeeper data to ClickHouse Keeper snapshot. Interserver
         | protocol in ClickHouse Keeper is also incompatible with
         | ZooKeeper so mixed ZooKeeper / ClickHouse Keeper cluster is
         | impossible.
         | 
         | So I guess yes?
        
       | michelb wrote:
       | Has anyone here used MonetDB? I wonder how it holds up against
       | other column-oriented databases.
        
       | mxstbr wrote:
       | We're big fans of ClickHouse here at GraphCDN, our entire
       | analytics stack is based on it[0] and it's been scaling well from
       | tens of thousands of events to now billions of events!
       | 
       | [0]: https://altinity.com/blog/delivering-insight-on-graphql-
       | apis...
        
       | gigatexal wrote:
       | What's the K8s story? I doubt where I am now I can request
       | physical servers and dedicated fast disks.
        
       | yamrzou wrote:
       | How does Clickhouse compare to Snowflake?
        
         | benjaminwootton wrote:
         | A few differences:
         | 
         | - Snowflake is SaaS, Clickhouse isn't yet - Clickhouse is open
         | source, Snowflake is proprietary - Snowflake has the virtual
         | warehouse concept and ability to scale compute up and down with
         | a single SQL statement. Clickhouse is a bit more traditional in
         | architecture. - Snowflake is hella expensive - Snowflake is a
         | bit more of a traditional data warehouse, whereas Clickhouse is
         | philosophically about powering through big datasets such as
         | denormalised click stream or logs
         | 
         | Both great products for their respective use cases
        
       | tobykeef wrote:
       | Not to be a the stick in the mud here. We recently moved from
       | Clickhouse to Druid due to issues we were having when scaling and
       | rebalancing the cluster. How does removing ZK help?
        
         | benjaminwootton wrote:
         | Interesting. We moved from Druid to Clickhouse for exactly the
         | same reason :-)
         | 
         | https://timeflow.systems/why-we-moved-from-druid-to-clickhou...
         | 
         | Clickhouse is significantly easier to operate than Druid in my
         | experience.
        
         | log4shell wrote:
         | Druid has quite some intelligence baked in to handle the
         | scaling by default. I am curious how clickhouse is doing in all
         | those aspects.
         | 
         | When we did a PoC, the operational aspect of clickhouse and
         | performance was severely lacking as compared to druid.
         | Clickhouse had bigger resources at its disposal than druid
         | during this PoC.
         | 
         | If they could improve the operational aspect and introduce
         | sensible defaults so that the users don't have to go through
         | 10000 configuration to work with data in clickhouse, I am sure
         | I will give it a go for some other usecase. It is simple on
         | surface but devil is in the details. Druid is much simpler and
         | sane at the scale I need to operate.
        
         | dreyfan wrote:
         | Because ZK is garbage and complicates every clustered
         | application that relies on it? Kafka is ditching ZK too.
         | 
         | Clickhouse cluster quite simply doesn't support elastic
         | rebalancing. Avoid CH if that is a hard requirement for your
         | setup.
        
           | StreamBright wrote:
           | I used to have a wipe this ZK node clean and rejoin the
           | cluster script to deal with ZK node outages that nobody could
           | explain.
        
       | monstrado wrote:
       | The introduction of parallelized Parquet reads coupled with
       | s3Cluster is really awesome. I feel ClickHouse is one step closer
       | to unlocking the ephemeral SQL compute cluster (e.g. Presto,
       | Hive) use case. I could imagine one day it having a HiveMetaStore
       | read-only database option for querying existing data in
       | companies...very fast, I might add.
        
       | gildedage77 wrote:
       | How does Clickhouse stack up against Druid? We're trying to make
       | a decision on the two technologies, and found this recent article
       | that shows Druid 8x faster than Clickhouse -
       | https://imply.io/post/druid-nails-cost-efficiency-challenge-...
        
       | fasteo wrote:
       | >>> SELECT * FROM system.contributors
       | 
       | Genius
        
       | aliswe wrote:
       | valued at $2B. whats their business model? corporate support?
        
         | nemo44x wrote:
         | They're going to build a cloud and rival Snowflake. Plus
         | support/service open source users.
        
         | goodpoint wrote:
         | Company valuations are Monopoly money.
        
       | mdasen wrote:
       | I'm a little surprised that Amazon hasn't created ClickHouse as a
       | Service. They've done it for Elasticsearch, Presto (Athena),
       | Hadoop, Kafka, MySQL, PostgreSQL. ClickHouse seems like it would
       | fit with their strategy of offing a datastore as a service. While
       | AWS does have things like Redshift and Timestream, it seems like
       | ClickHouse offers a lot of potential untapped value that Amazon
       | could capture.
        
         | ucarion wrote:
         | They may be working on it already. Such a product would take
         | awhile to create. It was only in September of this year that
         | Clickhouse was spun out of Yandex and quickly raised a Series A
         | + B.
        
           | stingraycharles wrote:
           | But Clickhouse spinning off from Yandex is not a requirement
           | for an AWS offering, right?
        
           | tmp_anon_22 wrote:
           | At this point product strategy for these companies must be
           | built with Amazon in mind, with a goal of outpacing them at
           | the juncture that AWS would be able to take a bite of the
           | market share easily.
           | 
           | As much as it sucks to have been an OSS-ish product Amazon
           | has taken a chunk out of, the game is now known and can be
           | proactively neutered with good planning.
        
           | higeorge13 wrote:
           | I am wondering whether it would a rival project to redshift?
        
             | gildedage77 wrote:
             | You mean ParAccel? :P
        
           | ksec wrote:
           | Well Clickhouse has been open source for quite some time. The
           | reason I could think of Amazon not doing it is because the
           | demand are low. Clickhouse still isn't mainstream yet.
        
         | bdcravens wrote:
         | Many of those were introduced sometime after they because
         | mainstream.
        
       ___________________________________________________________________
       (page generated 2021-12-16 23:01 UTC)