[HN Gopher] What's New in ClickHouse 21.12
___________________________________________________________________
What's New in ClickHouse 21.12
Author : nnx
Score : 111 points
Date : 2021-12-16 13:17 UTC (9 hours ago)
(HTM) web link (clickhouse.com)
(TXT) w3m dump (clickhouse.com)
| tonyhb wrote:
| I can't speak highly enough of Clickhouse. Since 20.x, which was
| already one of (the?) fastest column store databases around, it's
| gotten even better:
|
| - Built-in replication without the need to run Zookeeper, with
| multi-tier sharding support. They rewrite the ZK protocol using
| multi-paxos within CH itself. It's great.
|
| - Built-in batch inserts via the HTTP protocol... sorry, TCP :(.
| Previously you'd have to batch using buffer tables, proxies, or
| in-memory buffering within your client apps. This is no longer
| needed!
|
| - Better support for external data formats (avro, parquet)
|
| It's just... so good.
| MrBuddyCasino wrote:
| > Built-in batch inserts via the HTTP protocol
|
| Very welcome. We used to do that with a dedicated app.
|
| > It's just... so good.
|
| What Postgres is for RDBMs and SqlLite for embedded, ClickHouse
| is for time series. Tastefully designed and driven by
| engineering excellence. I wish them all the best.
| dunkelheit wrote:
| > Built-in replication without the need to run Zookeeper, with
| multi-tier sharding support. They rewrite the ZK protocol using
| multi-paxos within CH itself. It's great.
|
| AFAIK this description is kind of misleading. When they say
| that they got rid of Zookeeper people expect that they can just
| connect clickhouse nodes to each other and the replication will
| work. But that is not how things work - you still have to run
| external service called clickhouse-keeper. Basically what they
| did is they rewrote Zookeeper in C++.
| mr-karan wrote:
| Fwiw, Clickhouse Keeper _can_ be run as an external daemon if
| you wish so. But it's packaged with the server binary itself,
| so once you add <keeper_config> to all your server nodes,
| you're good to go, without running anything else.
| tonyhb wrote:
| You can run clickhouse-keeper embedded in the server, though.
| That way, each "primary" handles incoming SQL connections
| _and_ paxos communication without additional infra.
| karterk wrote:
| Do you have to do anything special to opt-in for the built-in
| batch inserts? Earlier, I was forced to use the buffer tables
| approach: how would one ditch that now?
| tonyhb wrote:
| If you're using the HTTP protocol, add `async_insert=1` to
| your connection string. You can tune the batching here: https
| ://clickhouse.com/docs/en/operations/settings/settings/...
| nojito wrote:
| Coolest part of clickhouse is it's ability to do ETL
| automagically.
|
| It really is a super power.
| polskibus wrote:
| What do you mean? Does it have an etl engine like MS SSIS or
| scheduler like airflow built-in?
| nojito wrote:
| I essentially load in two columns one called timestamp and
| another with a json blob.
|
| I then use this to kick off materialized views to
| automagically pluck out relevant JSON fields into views
|
| Similar to this
|
| https://eng.uber.com/logging/
| mr-karan wrote:
| That's an interesting way to load data and then use with
| Materialized Views. However, I am curious how do you make
| efficient use of compression codecs[0] that Clickhouse
| provides, or some neat features like TTL policies [1] using
| this method?
|
| [0]: https://clickhouse.com/docs/en/sql-reference/data-
| types/lowc...
|
| [1]: https://clickhouse.com/docs/en/sql-
| reference/statements/alte...
| Redsquare wrote:
| Materialized views have a backing table where you can use
| the codecs. Can add ttl the ingestion table
| flurly wrote:
| Great minds think alike :)
|
| We do the exact same thing at GraphJSON
| https://www.graphjson.com/guides/about
| polskibus wrote:
| Does anyone know if clickhouse keeper replace Zookeeper in non
| clickhouse scenarios?
| e12e wrote:
| According to:
| https://clickhouse.com/docs/en/operations/clickhouse-keeper/
|
| > By default, ClickHouse Keeper provides the same guarantees as
| ZooKeeper (linearizable writes, non-linearizable reads). It has
| a compatible client-server protocol, so any standard ZooKeeper
| client can be used to interact with ClickHouse Keeper.
| Snapshots and logs have an incompatible format with ZooKeeper,
| but clickhouse-keeper-converter tool allows to convert
| ZooKeeper data to ClickHouse Keeper snapshot. Interserver
| protocol in ClickHouse Keeper is also incompatible with
| ZooKeeper so mixed ZooKeeper / ClickHouse Keeper cluster is
| impossible.
|
| So I guess yes?
| michelb wrote:
| Has anyone here used MonetDB? I wonder how it holds up against
| other column-oriented databases.
| mxstbr wrote:
| We're big fans of ClickHouse here at GraphCDN, our entire
| analytics stack is based on it[0] and it's been scaling well from
| tens of thousands of events to now billions of events!
|
| [0]: https://altinity.com/blog/delivering-insight-on-graphql-
| apis...
| gigatexal wrote:
| What's the K8s story? I doubt where I am now I can request
| physical servers and dedicated fast disks.
| yamrzou wrote:
| How does Clickhouse compare to Snowflake?
| benjaminwootton wrote:
| A few differences:
|
| - Snowflake is SaaS, Clickhouse isn't yet - Clickhouse is open
| source, Snowflake is proprietary - Snowflake has the virtual
| warehouse concept and ability to scale compute up and down with
| a single SQL statement. Clickhouse is a bit more traditional in
| architecture. - Snowflake is hella expensive - Snowflake is a
| bit more of a traditional data warehouse, whereas Clickhouse is
| philosophically about powering through big datasets such as
| denormalised click stream or logs
|
| Both great products for their respective use cases
| tobykeef wrote:
| Not to be a the stick in the mud here. We recently moved from
| Clickhouse to Druid due to issues we were having when scaling and
| rebalancing the cluster. How does removing ZK help?
| benjaminwootton wrote:
| Interesting. We moved from Druid to Clickhouse for exactly the
| same reason :-)
|
| https://timeflow.systems/why-we-moved-from-druid-to-clickhou...
|
| Clickhouse is significantly easier to operate than Druid in my
| experience.
| log4shell wrote:
| Druid has quite some intelligence baked in to handle the
| scaling by default. I am curious how clickhouse is doing in all
| those aspects.
|
| When we did a PoC, the operational aspect of clickhouse and
| performance was severely lacking as compared to druid.
| Clickhouse had bigger resources at its disposal than druid
| during this PoC.
|
| If they could improve the operational aspect and introduce
| sensible defaults so that the users don't have to go through
| 10000 configuration to work with data in clickhouse, I am sure
| I will give it a go for some other usecase. It is simple on
| surface but devil is in the details. Druid is much simpler and
| sane at the scale I need to operate.
| dreyfan wrote:
| Because ZK is garbage and complicates every clustered
| application that relies on it? Kafka is ditching ZK too.
|
| Clickhouse cluster quite simply doesn't support elastic
| rebalancing. Avoid CH if that is a hard requirement for your
| setup.
| StreamBright wrote:
| I used to have a wipe this ZK node clean and rejoin the
| cluster script to deal with ZK node outages that nobody could
| explain.
| monstrado wrote:
| The introduction of parallelized Parquet reads coupled with
| s3Cluster is really awesome. I feel ClickHouse is one step closer
| to unlocking the ephemeral SQL compute cluster (e.g. Presto,
| Hive) use case. I could imagine one day it having a HiveMetaStore
| read-only database option for querying existing data in
| companies...very fast, I might add.
| gildedage77 wrote:
| How does Clickhouse stack up against Druid? We're trying to make
| a decision on the two technologies, and found this recent article
| that shows Druid 8x faster than Clickhouse -
| https://imply.io/post/druid-nails-cost-efficiency-challenge-...
| fasteo wrote:
| >>> SELECT * FROM system.contributors
|
| Genius
| aliswe wrote:
| valued at $2B. whats their business model? corporate support?
| nemo44x wrote:
| They're going to build a cloud and rival Snowflake. Plus
| support/service open source users.
| goodpoint wrote:
| Company valuations are Monopoly money.
| mdasen wrote:
| I'm a little surprised that Amazon hasn't created ClickHouse as a
| Service. They've done it for Elasticsearch, Presto (Athena),
| Hadoop, Kafka, MySQL, PostgreSQL. ClickHouse seems like it would
| fit with their strategy of offing a datastore as a service. While
| AWS does have things like Redshift and Timestream, it seems like
| ClickHouse offers a lot of potential untapped value that Amazon
| could capture.
| ucarion wrote:
| They may be working on it already. Such a product would take
| awhile to create. It was only in September of this year that
| Clickhouse was spun out of Yandex and quickly raised a Series A
| + B.
| stingraycharles wrote:
| But Clickhouse spinning off from Yandex is not a requirement
| for an AWS offering, right?
| tmp_anon_22 wrote:
| At this point product strategy for these companies must be
| built with Amazon in mind, with a goal of outpacing them at
| the juncture that AWS would be able to take a bite of the
| market share easily.
|
| As much as it sucks to have been an OSS-ish product Amazon
| has taken a chunk out of, the game is now known and can be
| proactively neutered with good planning.
| higeorge13 wrote:
| I am wondering whether it would a rival project to redshift?
| gildedage77 wrote:
| You mean ParAccel? :P
| ksec wrote:
| Well Clickhouse has been open source for quite some time. The
| reason I could think of Amazon not doing it is because the
| demand are low. Clickhouse still isn't mainstream yet.
| bdcravens wrote:
| Many of those were introduced sometime after they because
| mainstream.
___________________________________________________________________
(page generated 2021-12-16 23:01 UTC)