[HN Gopher] What is ClickHouse how it compares to PostgreSQL and...
___________________________________________________________________
What is ClickHouse how it compares to PostgreSQL and TimescaleDB
for time series
Author : LoriP
Score : 202 points
Date : 2021-10-21 15:31 UTC (7 hours ago)
(HTM) web link (blog.timescale.com)
(TXT) w3m dump (blog.timescale.com)
| qaq wrote:
| ClickHouse shines at scales that timescale has no hope of ever
| supporting. Hence the choice of workloads in the test. Cloudflare
| was ingesting 6,000,000 rows per second into 36 node (dual
| E5-2630) ClickHouse cluster (2018) was something like 20PB of
| data per year.
| akulkarni wrote:
| (TimescaleDB co-founder)
|
| 6,000,000 rows inserted per second is great! And if you need
| that for your workload, then you probably should choose
| ClickHouse over TimescaleDB (well at least, for now ;-)
|
| The reason we don't include that in the benchmark is that most
| developers _don't need_ 6,000,000 rows inserted per second.
|
| And also - that performance doesn't come for free, but requires
| giving up a lot of things that most developers may need: e.g.,
| no transactions, immutable tables (can't easily update / delete
| data), SQL-like but not quite SQL query language, inefficient
| joins, inefficient for point queries retrieving single rows by
| their keys, etc. (We go into much more detail in the blog
| post.)
|
| So it comes down to the fundamental analogy used in the post:
| Do you need a car (versatility) or a bull dozer (narrow
| specialization)?
|
| If the answer is that you need to support 6,000,000 rows
| inserted per second, then by all means, choose the bull dozer.
|
| > ClickHouse shines at scales that timescale has no hope of
| ever supporting.
|
| I'm not sure if this was a throwaway line, or if it was the
| result of a detailed analysis of TimescaleDB's architecture,
| but if you don't mind, I'll share this: with TimescaleDB multi-
| node [0] we are getting close to that performance, and the
| product keeps getting better.
|
| [0] https://blog.timescale.com/blog/building-a-distributed-
| time-...
| ryanbooz wrote:
| (post author)
|
| Those are great, impressive numbers. We certainly don't claim
| to be all things to all people, but the benchmark was run using
| single instances mostly because that is what most other
| benchmarks published by others have done.
|
| With a multi-node TimescaleDB cluster, ingest does literally
| scale to millions of rows/second and we have a numerous users
| achieving great numbers. One Fortune 100 company has a 25+
| TimescaleDB cluster backing their network monitoring stack and
| loves it for their use case.
|
| At some point, when we can, I'm sure we'll start to do more
| with multi-node benchmarking too to give some input to the
| conversation.
| akulkarni wrote:
| Minor correction: It is actually a 40+ node cluster :-)
| qaq wrote:
| and then it will come down to spec of nodes actual fields and
| so on etc. Also batch size obviously plays a big role here as
| CH is optimized for very large batch sizes and benchmark is
| not really using that kind of batch size. BTW. I am not
| involved with CH but any kind of vendor benchmarking their
| wears will always select params that will make their offering
| look good
| ryanbooz wrote:
| Sure, these tests were not using really large batch sizes
| because of the other benchmarks we were trying to replicate
| (but with more detail). Honestly, for this single instance
| setup, we saw improvement in CH when we went from (say) 5k,
| 10k, or 20k batches. But it was a few percentage points at
| a time, not a magnitude different. I'm sure things changes
| with a cluster setup too, that just wasn't the focus of
| this post.
|
| Interestingly, we were just testing a multi-node
| TimescaleDB cluster the other day and found that 75k
| rows/batch was the optimal size as nodes increased.
|
| So you're completely correct. I tried to be very clear that
| we were not intentionally "cooking the books" and there's
| surely other optimizations we could have made. Most of the
| suggestions so far, however, require further setup of CH
| features that haven't been used in other benchmarks, so we
| tried to over communicate our strategy and process.
|
| We also fully acknowledged in the post that an siloed
| "insert", wait, then "query" test is not real world. But,
| it's the current way TSBS has been used and other DB
| engines have come along and used the methodology for now.
| Maybe that process will change in time to come with other
| contributions.
|
| BTW, we'll discuss some of this next week during the live-
| stream and the video will be available after.
| mrwnmonm wrote:
| Any plans to do this kind of detailed comparison with Druid too?
| ryanbooz wrote:
| There's a long list of DBs users would like to see. Druid is on
| the list but probably not happening in the near-term without
| some community help.
|
| Remember, TSBS is open-source and we've had some great
| contributions from many teams/databases. :-)
| pradeepchhetri wrote:
| How can i replicate the results of the benchmarks ? I am
| interested to look at the CH table schema you used.
| carlotasoto wrote:
| (Timescale team member here)
|
| We used the Time Series Benchmark Suite for all these tests
| https://github.com/timescale/tsbs. Also, Ryan (post author)
| will be giving all the config details in a Twitch stream
| happening next Wednesday. We'll be uploading the video to
| Youtube immediately afterwards too >>
|
| twitch.tv/timescaledb youtube.com/timescaledb
| ryanbooz wrote:
| (Post author)
|
| Howdy! All of the details about our TSBS settings in the
| performance section of the docs. Also, we'll be streaming a
| sample benchmark of the two databases next Wednesday at 10AM
| ET/4PM CET.
|
| https://blog.timescale.com/blog/what-is-clickhouse-how-does-...
|
| twitch.tv/timescaledb
| pradeepchhetri wrote:
| Few comments:
|
| - The CH table schema generated by TSBS isn't optimized for
| the queries. First of all, it doesn't uses CODEC
| (https://altinity.com/blog/2019/7/new-encodings-to-improve-
| cl...) and many other optimizations CH have.
|
| > We tried multiple batch sizes and found that in most cases
| there was little difference in overall insert efficiency
|
| This is wrong in CH world where batch size matters a lot. I
| would recommend keep this even more higher around (10x of
| current value).
|
| Humble Suggestion: There are many things not quite properly
| interpreted about CH and reading through the blog it seems
| like you're focusing more on areas which CH is
| lacking/missing. Please don't do these things.
| PeterZaitsev wrote:
| This is what tend to make all vendor benchmarks
| "benchmarketing" - while many of us fully intend to give a
| fair shot to other technologies we tend to know best
| practices for our own software better than "competition"
| ryanbooz wrote:
| Two quick responses:
|
| - The code that TSBS uses was contributed by Altinity[1].
| If there is a better setup, please feel free to submit a
| PR. As stated elsewhere, we did have a former CH engineer
| review and even updated ClickHouse to the newest version
| __yesterday__ based on his suggestion to ensure we had the
| best numbers. (and some queries did improve after
| upgrading, which are the numbers we presented)
|
| - It seems like you read the article (great job - it was
| long!!), so I'm sure you understand that we were trying to
| answer performance and feature questions at a deeper level
| than almost any benchmark we've seen to date. Many just
| show a few graphs and walk away. We fully acknowledged that
| smaller batches are not recommended by CH, but something
| many (normally OLTP) users would probably have. It matters
| and nobody (that we know of) has shown those numbers
| before. And in our test, larger batch sizes do work well,
| but not to some great magnitude in this one server setup.
| Did 10k or 20k rows maybe go a little faster for CH?
| Sometimes yes, sometimes negligible. The illustration was
| that we literally spent months and hundreds of benchmark
| cycles trying to understand the nuances.
|
| I think we're pretty clear in the post that CH is a great
| database for the intended cases, but it has shortcomings
| just like TimescaleDB does and we tried to faithfully
| explore each side.
|
| [1]: https://github.com/timescale/tsbs/pull/26
| jonatasdp wrote:
| You can run tsbs by yourself. Just check the options here:
| https://github.com/timescale/tsbs/blob/master/docs/clickhous...
|
| The blog post benchmarks used `--use-case=cpu-only` case for
| data ingestion. You can see the table definition here:
| https://github.com/timescale/tsbs/blob/1eb7705ff921fd31784c0...
| coming from here:
| https://github.com/timescale/tsbs/blob/master/pkg/targets/cl...
| mrwnmonm wrote:
| If you have thousands of clients writing to the database
| individual rows, one per request, and thousands of clients making
| queries (some of them are complex, some are not). Does ClickHouse
| even get used in this scenario?
| PeterZaitsev wrote:
| I think it is worth noting while Clickhouse is often used for
| time series store it is not particularly designed for this use
| case, but more for storing logs, events and similar data.
| VictoriaMetrics would be interesting comparable which is inspired
| by Clickhouse design but Optimized for time series store in
| particular https://victoriametrics.com/
| dmw_ng wrote:
| More war stories: found Timescale easier to setup (maybe just
| because more familiar), but raw query perf is not something you
| just magically get for free. Timescale requires a lot of
| investment in planning. In one project we had simple time range
| scan queries against a less-than-RAM-sized table taking tens of
| seconds to complete.
|
| ClickHouse has a bit more ops overhead, but requires very little
| in the way of pre-planning. Just throw whatever you want at it
| and it seems to sing by default.
|
| Regarding ops overhead, ClickHouse also has a "local" mode where
| you can query a huge range of compressed formats without ever
| performing any kind of ETL step. That means queries can run e.g.
| directly against S3 storage. For batch logs analysis, IMHO local
| is a gamechanger. Most bulk logging systems produce massive
| amounts of S3 objects, and ClickHouse lets you tear through these
| infrequently (but at high speed) when desired without any
| bulky/opsey ETL step, and no cluster running idle just waiting
| for a handful of daily jobs to arrive.
|
| (I love this style of system in general, but the clear
| engineering work behind ClickHouse completely won me over, I'd
| apply it anywhere I could)
| akulkarni wrote:
| (TimescaleDB co-founder)
|
| Thank you for the feedback - it is conversations like this that
| enable us to understand how we can continue to make TimescaleDB
| better.
|
| And some of the ideas you are discussion are on our roadmap -
| if anyone wants to help, we are hiring :-)
|
| https://www.timescale.com/careers
| CodesInChaos wrote:
| Can somebody recommend a database suitable for an event sourced
| application:
|
| * One series of events per user
|
| * Each series grows at about 10 events/minute while the user is
| active
|
| * Fancy queries are not required, typically a user's event series
| is consumed in order to update aggregate state for that user
|
| * Either used online, adding events one at a time and needing to
| immediately update the aggregate state
|
| * Used offline syncing a batch of hours or days at once. When a
| large time interval, eventually consistent state updates are
| acceptable
|
| * It must be possible to delete a user's data, regardless how old
| it is (a nightly batch job deleting multiple users at once is
| fine, if it helps performance)
|
| * Migrating old data should be possible with reasonable
| performance and without consuming excessive temporary memory
|
| * Compact storage is important (simple zstd compression should
| suffice, though columnar compression might be slightly better)
|
| * Being able to use a cheaper object store like S3 for old data
| would be nice
|
| At a glance timescale community appears to meet most
| requirements. The userid can be used as `segmentby` key, and the
| data compressed via columnar compression. But it seems to have
| limitations with migration (sounds like it requires me to
| manually decompress and recompress chunks, instead of simply
| transforming one (chunk, segment) piece at a time) and deletion
| (I need to delete everything with a specific `segmentby` key).
|
| Alternatively there is the DIY approach, of serializing each
| entry in a compact format, one file per user, and then once data
| is old enough compress it (e.g. with zstd) and upload it to S3.
| ryanbooz wrote:
| Looks like you edited this with some more detail, so I'll
| answer higher.
|
| Compression in TimescaleDB used to mean all compressed data was
| immutable and the table schema couldn't be altered. Since
| TimescaleDB 2.1, and 2.3 that has changed in a few ways.
|
| - Schema can now have columns added or renamed - Compressed
| chunks can now have rows inserted into them (partially
| compressed, the background job will fully compress when it
| runs)
|
| Row deletion is not possible yet, but I've personally been
| having some internal conversations around ways to do exactly as
| you're suggesting in the near-term; deleting rows based on a
| "segmentby" column. I have some testing to do... but my focus
| has been taken up by a certain, 33-minute long read, blog post.
|
| Feel free to join our Slack and DM me if you want to talk about
| it further.
|
| slack.timescale.com
| ryanbooz wrote:
| (post author and Timescaler)
|
| What do you mean by "migrating old data"? Don't want to make
| assumptions before answering further.
| CodesInChaos wrote:
| Updating many/all rows as a rare maintenance task, typically
| as part of deploying a new version of the application.
|
| I know timescale has native support for the most common cases
| (adding nullable columns/renaming columns). But sometimes the
| transformation is more complex. Sometimes an sql update
| statement would suffice, sometimes streaming each segment in
| chronological order to the application returning the updated
| row might be required.
| smarx007 wrote:
| 10 events/minute - I would guess SQLite/DuckDB can fit the bill
| for many years?
| [deleted]
| CodesInChaos wrote:
| I don't think performance is the issue, but I'd like to keep
| the storage small/cheap via compression, since this is a
| hobby project. (though admittedly it's probably premature
| optimization)
| andrejserafim wrote:
| Our anecdata: we store telemetry per thing. After loading a month
| worth of data - timescaldb as hosted by their cloud ran a
| difference aggregation in seconds. Clickhouse routinely did it in
| 20 millis.
|
| Simple avg, etc were better, but always clickhouse was an order
| of magnitude faster than timescale. We didn't invest a whole
| bunch into optimization other than trying some indexing
| strategies in timescaledb.
|
| So for our use case the choice is clear.
| claytonjy wrote:
| Was this for your primary source-of-truth, or more of a
| downstream data warehouse, or something else?
|
| I'm struggling to imagine a case where these are the two things
| being considered; Timescale is the obvious choice for a primary
| database, Clickhouse the obvious choice for a warehouse. I
| wouldn't let my user-facing app write to Clickhouse, and while
| I could potentially get away with a read-only Timescale replica
| for internal-facing reports I would expect to eventually
| outgrow that and reach for Clickhouse/Snowflake/Redshift.
| dominotw wrote:
| > Clickhouse the obvious choice for a warehouse >
| Clickhouse/Snowflake/Redshift.
|
| but clickhouse is very unlike the other two. when i think of
| a warehouse i think star schema, data modeling ect not
| something that hates joins.
| claytonjy wrote:
| Agreed, I wouldn't use Clickhouse for usual warehouse stuff
| either, mostly because I can't imagine it plays well with
| dbt which is a non-starter these days.
|
| I'd still argue Clickhouse is closer to Snowflake/Redshift
| than anything OLTP, and their name is intentionally chosen
| to evoke warehouse-like scenarios.
| FridgeSeal wrote:
| What makes you think CH doesn't like joins?
|
| Having used Redshift, Snowflake and CH for similar
| workloads, I'd much prefer ClickHouse to the other 2.
|
| Snowflake is hideously expensive for the subpar perf it
| offers in my experience and Redshift is mediocre at best
| in general.
| hodgesrm wrote:
| Is your comment on ClickHouse and DBT based on using the
| DBT ClickHouse plugin? [0] If so I would be very
| interested in understanding what you or others see as
| deficiencies.
|
| [0] https://github.com/silentsokolov/dbt-clickhouse
| encoderer wrote:
| > I wouldn't let my user-facing app write to Clickhouse
|
| I've been thinking of doing exactly that. What are your
| concerns?
| claytonjy wrote:
| I suppose it depends what you're going to let your user do,
| but OLAPs in general and Clickhouse in particular don't do
| well under row-oriented workloads, as described in the post
| here. I'm imagining users primarily operating on small
| numbers of rows and sometimes making updates to or deleting
| them, a worst-case scenario for Clickhouse but best-case
| for an OLTP like Postgres.
| encoderer wrote:
| Ah totally. Thanks for sharing your thoughts! In my case
| I'm evaluating clickhouse as a source of truth for
| customer telemetry data. Totally agree about the OLTP
| limitations.
| willvarfar wrote:
| (Remember that clickhouse is not reliable. It doesn't
| pretend to be.
|
| Clickhouse is great for lots of common query workloads,
| but if losing your data would be a big deal then it makes
| a lot of sense to have your data in a reliable and backed
| up place (eg timescale or just s3 files or whatever) too.
|
| Of course lots of times people chuck stuff into
| clickhouse and it's fine if they lose a bit sometimes.
| YMMV.)
| dreyfan wrote:
| https://blog.cloudflare.com/http-analytics-
| for-6m-requests-p...
|
| has some good thoughts. The main thing you'll likely need
| is some sort of a buffer layer so you can do bulk inserts.
| Do not write a high-volume of single-row inserts into
| Clickhouse.
| encoderer wrote:
| Thanks for sharing the link! I've heard the bulk insert
| thing before and to be honest I've always thought that
| RDBMSs don't love single row inserts either. Seems
| clickhouse takes that to a new level.
|
| In our case we are using sqs and usually insert 20-100
| rows into the db at a time so I'm going to benchmark how
| that does in clickhouse.
| [deleted]
| zepearl wrote:
| With Clickhouse you can use a "buffer table", which uses
| just RAM and sits on top of a normal table:
| https://clickhouse.com/docs/en/engines/table-
| engines/special...
|
| Rows inserted into the buffer table are then flushed to
| the normal/base table when one of the limits (defined
| when the buffer table is created) is reached (limits are
| max rows, max bytes, max time since the last flush), or
| when you drop the buffer table.
|
| I'm using it and it works (performance difference can be
| huge compared to perform single inserts directly into a
| real/normal table), but be careful - the flushed rows
| don't give a guarantee of which row is flushed in which
| sequence, so using a buffer table is a very bad idea if
| your base table is something which relies on the correct
| sequences of rows that it receives.
| ants_a wrote:
| From my experience of benchmarking these databases on
| scientific data (highly regular timeseries) and looking at the
| internals of both, these kinds types of number happen when
| answering the query needs crunching through many rows, but the
| output has few. i.e. the queries are filtering and/or
| aggregating a ton of input rows, that can't be excluded by
| indexes or queried from preaggregations.
|
| From what I can tell it comes down to execution engine
| differences. TimeScale, even with compressed tables, uses a row
| by row execution engine architecturally resembling IE6 era JS
| engines. ClickHouse uses a batched and vectorized execution
| engine utilizing SIMD. Difference is one to two orders of
| magnitude of throughput in terms raw number of rows per core
| pushed through the execution engine.
|
| Postgres/TimeScale could certainly also implement a similar
| model of execution, but to call it an undertaking would be an
| understatement considering the breadth and extensibility of
| features that the execution engine would need to support. To my
| knowledge no one is seriously working on this outside of
| limited capability hacks like vops or PG-Strom extensions.
| ryanbooz wrote:
| (post author)
|
| You do a great job summarizing some of the benefits of
| ClickHouse we mentioned in the post, including the vectorized
| engine!
|
| That said, I'm not sure I'd refer to PostgreSQL/TimescaleDB
| engine architecture as resembling IE6 JS support. Obviously
| YMMV, but every release of PG and TimescaleDB bring new
| advancements to query optimizations for the architecture they
| are designed for, which was the focus of the post.
|
| I'm personally still impressed, after 20+ years of working
| with SQL, relational databases, when any optimization engine
| can use statistics to find the "best" plan among
| (potentially) thousands in a few ms. Maybe I'm too easily
| impressed. :-D
| ants_a wrote:
| The optimization engine is of course great (despite
| occasionally missing hard), but I am not referring to it. I
| am referring to the way that PostgreSQL executes query
| plans, the way rows are pulled up the execution tree, is
| very similar to first iterations JavaScript engines - a
| tree based interpreter. Picking out columns from rows and
| evaluating expressions used to work the same until PG11,
| where we got a bytecode based interpreter and a JIT for
| those. But so far rows are still working the same way, and
| it hurts pretty bad when row lookup is cheap and the rows
| end up either thrown away or aggregated together with basic
| math.
| mfreed wrote:
| With TimescaleDB compression, 1000 rows of uncompressed
| data are compressed into column segments, moved to
| external TOAST pages, and then pointers to these column
| segments are stored in the table's "row" (along with
| other statistics, including some common aggregates).
|
| So while the query processor might still be "row-by-row",
| each "row" it processes actually corresponds to a column
| segment for which parallelization/vectorization is
| possible. And because these column segments are TOASTed,
| the row itself are just pointers, and you only need to
| read in those compressed column segments that you are
| actually SELECTing.
|
| Anyway, might have known this, just wanted to clarify.
| Thanks for discussion!
| GordonS wrote:
| How many data points were those aggregations being computed
| over? How much memory does your Postgres server have, and are
| you using SSD storage (with associated postgres config tweaks)?
| ryanbooz wrote:
| (Post author)
|
| Howdy! We provided all of those details in the post and
| you're welcome to join us next week when we live-stream our
| setup and test!
|
| https://blog.timescale.com/blog/what-is-clickhouse-how-
| does-...
| GordonS wrote:
| I was responding to @andrejserafim, asking about their
| scenario, not the article.
| ryanbooz wrote:
| Gotcha! My apologies for not seeing the thread nature. HN
| threads get me sometimes. :-)
| ryanbooz wrote:
| (N.B. post author)
|
| Thanks for the feedback. Without knowing your situation, one of
| the things we show in the blog post is that TimescaleDB
| compression often changes the game on those kinds of queries
| (data is transformed to columnar storage when you compress).
| You don't mention if you did that or not, but it's something
| we've seen/noticed in every other benchmark at this point -
| that folks don't enable it for the benchmark.
|
| And second point of the article is that you have lots of
| options for whatever works in your specific situation. But,
| make sure you're using the chosen database features before
| counting it out. :-)
| maxmcd wrote:
| I wonder if it's worth taking a page out of the MongoDB book
| and enabling these kinds of benchmark altering settings by
| default. We certainly selected clickhouse over tailscale
| internally because of major performance differences in our
| internal testing that might have gone the other way had we
| "known better".
| ryanbooz wrote:
| Indeed. Lots of discussion over this in the last few
| months. There are nuances, but I think you'll see some
| progress in this area over the next year.
| brightball wrote:
| That's a really thorough comparison. Much more detailed than I
| expected.
|
| From what I see, the trade off in disk space usage would point me
| toward Timescale for most of my workloads. The insert performance
| tradeoff just wouldn't justify the difference for me.
| ericb wrote:
| A licensing comparison would be a good addition.
| PeterZaitsev wrote:
| Good question. Was it Open Source version of TimeScaleDB
| compared or Source Available ?
| ryanbooz wrote:
| Sure. As we shared in the blog post it was tested (like
| other benchmarks) on dedicated EC2 instances using the
| freely available Community version.
| PeterZaitsev wrote:
| This does not answer the question - is it Open Source
| License or Source Available (TSL)
| https://www.timescale.com/legal/licenses
| CodesInChaos wrote:
| the original article and the parent poster say that the
| community edition was used:
|
| > Versions: TimescaleDB version 2.4.0, community edition,
| with PostgreSQL 13
|
| and the link you posted explains that it's the non OSI
| license version:
|
| > TimescaleDB Community is made available under the
| Timescale License ("TSL")
| darthShadow wrote:
| Does the disk usage go down later once the numerous parts are
| merged or not?
|
| I would assume it does but reading the article implies that it
| does not.
| ryanbooz wrote:
| Great question. Yes, eventually it does, but (at least for
| now) it wasn't something we could reliably force as part of
| the query cycle and know everything was in it's "best" state
| with ClickHouse. To be honest, we didn't provide the final
| compressed size of either database because of the need to
| wait.
|
| The code that's currently used by TSBS was submitted by
| Altinity, a heavy supporter of ClickHouse in the U.S., but
| TSBS is open source and anyone is welcome to contribute and
| make the process/test better!
| darthShadow wrote:
| Thanks for the answer.
|
| May be worth pointing that out in the article since the
| increased disk usage has been mentioned multiple times in
| the article without any indication that it's only temporary
| until ClickHouse merges the parts.
| zepearl wrote:
| Concerning Clickhouse, yes it does - exactly the same thing
| e.g. as when you have 2 compressed files containing each 100
| sorted rows, when you merge those 200 rows into a single
| file, sort them and compress them, the result will be smaller
| than the sum of the 2 separate files.
|
| How much you save is again exactly the same as when dealing
| directly with files: it depends on the data and on the
| compression algo.
| ryanbooz wrote:
| (Post author)
|
| Thanks for the compliment! It's becoming a habit with us and
| benchmarks. We just really want to dig in and understand what's
| going on and why things work the way they do. ;-)
|
| There really are so many nuances and as we tried to say a
| number of times, ClickHouse is really great at what it does
| well. But it's still OLAP at heart (which precludes OLTP
| features many apps take for granted) and after enabling
| TimescaleDB compression, the query story isn't as cut and dry.
| We don't claim that TimescaleDB is the fastest in all
| circumstances, or that it absolutely has to be for every
| workload. Features and versatility play a major part in the
| decision.
| chalcolithic wrote:
| Could putting RedPanda/Kafka in front of ClickHouse make it
| insert benchmark winner? Of course it means operational expenses
| but I wonder if this route is worth exploring?
| grzff wrote:
| Fauci funded COVID-19: https://www.zerohedge.com/covid-19/nih-
| admits-funding-gain-f...
|
| Everyone involved should face the firing squad.
| cyber1 wrote:
| Very interesting to see comparison TimescaleDB vs
| VictoriaMetrics. Car vs car :)
| eatonphil wrote:
| I was surprised to see that ClickHouse and ElasticSearch have the
| same number of contributors. That's pretty astounding given how
| much older and more prominent ElasticSearch has been.
|
| https://github.com/ClickHouse/ClickHouse/graphs/contributors
|
| https://github.com/elastic/elasticsearch/graphs/contributors
|
| Edit: I was very off. The Github contributor graph does not show
| all actual contributors. ElasticSearch has somewhere around 2-3
| times as many contributors as ClickHouse.
| hodgesrm wrote:
| ClickHouse now has more unique contributors with merged PRs on
| an annual basis. The lines crossed early this year, or even
| late last year.
| eatonphil wrote:
| Thanks! Could you point me at something concrete. :)
| hodgesrm wrote:
| Go to one of the several public ClickHouse endpoints and
| run this query: -- Elastic vs CH in a
| single table. SELECT toYear(created_at) Year,
| uniqIf(creator_user_login, repo_name in
| ('elastic/elasticsearch')) "Elastic",
| uniqIf(creator_user_login, repo_name in
| ('yandex/ClickHouse', 'ClickHouse/ClickHouse'))
| "CLickHouse" FROM github_events WHERE
| event_type = 'PullRequestEvent' AND merged = 1
| AND repo_name in ('yandex/ClickHouse',
| 'ClickHouse/ClickHouse', 'elastic/elasticsearch')
| GROUP BY Year ORDER BY Year
|
| You can access the ClickHouse web UI for this dataset here:
| https://github.demo.trial.altinity.cloud:8443/play?user=dem
| o. The password is "demo" (type it in the left side.) This
| is the Altinity.Cloud copy of Alexey Milovidov's excellent
| github_events dataset.
|
| When I run this query I get the following numbers.
| Year|Elastic|CLickHouse| ----|-------|----------|
| 2015| 191| 0| 2016| 299| 40|
| 2017| 296| 85| 2018| 284| 142|
| 2019| 341| 232| 2020| 339| 300|
| 2021| 243| 294|
|
| Just speculation on my part but the drop Elastic
| contributors may be a side effect of the licensing change.
| rohitnair wrote:
| As per the landing pages of the projects, ES has 1.6k
| contributors whereas ClickHouse has 803. The contributors page
| likely only lists the top contributors to keep the page load
| time manageable.
| eatonphil wrote:
| That makes much more sense. Thanks for pointing that out.
| AdamProut wrote:
| Clickhouse is by far the leading open source columnar SQL data
| warehouse at this point. We have had strong open source
| operational SQL DBs for many years (MySQL, Postgres), but no
| open source systems that mirrored closed source MPP columnstore
| until clickhouse. Its interesting that it took "this long" for
| a strong open source SQL DW to emerge.
| LoriP wrote:
| ClickHouse has gained a huge following and honestly that's been
| pretty well earned. For the kinds of apps that they target it's
| a great choice, it's great technology with a very able team
| behind it.
| rkwasny wrote:
| There is some creative engineering going here :) have a look:
|
| https://github.com/timescale/tsbs/blob/master/scripts/load/l...
|
| vs
|
| https://github.com/timescale/tsbs/blob/master/scripts/load/l...
| csdvrx wrote:
| > There is some creative engineering going here
|
| Agreed. At a previous work, clickhouse outperformed timescale
| by several orders of magnitude, under about every condition.
|
| The timescale team seems to recognize that (look for the
| comment about clickhouse being a bulldozer) but they seem to
| say timescale can be better suited.
|
| In my experience, in about 1% of the cases, yes, timescale will
| be a better choice (ex: if you do very small batches of
| insertions, if you need to remove some datapoints) but in 99%
| of the usecases for a time series database, clickhouse is the
| right answer.
|
| There seems to have been several improvements to timescale
| since 2018, with columnar storage, compression, etc. and that's
| good because more competition is always better.
|
| But in 2021, clickhouse vs timescale for a timeseries is like
| postgres vs mongo for a regular database: unless you have
| special constraints [*], the "cool" solution (timescale or
| mongo) is the wrong one.
|
| [*]: you may think you have a unique problem and you need
| unique features, but odds are, YAGNI
|
| https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it
| rkwasny wrote:
| Also this queries are different? order by "time" vs order by
| "created_at"
|
| https://github.com/timescale/tsbs/blob/a045665d9c94426bbc405...
|
| https://github.com/timescale/tsbs/blob/a045665d9c94426bbc405...
| ryanbooz wrote:
| We were using tags, so that "else" block isn't the one being
| used for ClickHouse. Regardless, the table that is created
| (by the community and verified by former CH engineers) orders
| by created_at, not time and so that query should be the
| "fastest" the distinct possible.
| ryanbooz wrote:
| (Post author)
|
| I'm not sure why you think that's creative engineering. What
| you're pointing to is the depth of available configuration that
| the contributors to TSBS have exposed for each database. It's
| totally open source and anyone is welcome to add more
| configuration and options! I believe (although not totally
| sure) that Altinity and ClickHouse folks added their code a few
| years ago - at least it wasn't anyone on the Timescale team.
|
| That said, we didn't actually use those scripts to run our
| tests. Please join us next Wednesday (10AM ET/4PM CET) to see
| how we set the databases up and ran the benchmarks. We'd be
| delighted to have you try it on your own too!
| rkwasny wrote:
| Ah so the tests you have used are not the ones in
| https://github.com/timescale/tsbs ?
| ryanbooz wrote:
| All the same tests. You simply pointed to a shell script
| that's configurable to run tests for each database. We
| provided details in the blog post of exactly what settings
| we used for each database (cardinality, batch size, time
| range, TimescaleDB chunk size, etc.) so you can use those
| script to configure and run the tests too.
| darksaints wrote:
| It would be awesome to combine the following things:
|
| * PostGIS
|
| * Timescale
|
| * Citus
|
| * Zedstore
|
| This truly would be the relational DB to end all relational DBs.
| Unfortunately, we run into a couple problems:
|
| * Managing multiple extensions is a burdensome task, which should
| be in the wheelhouse of cloud providers, but...
|
| * Timescale and Citus are are open core, holding back features
| for customers. Their primary revenue channels are their cloud
| offerings. Unfortunately you can't get Citus and Timescale in the
| same cloud offering, cause you're dealing with two separate
| companies.
|
| * PostGIS has multiple cloud providers, but none of them have
| Timescale or Citus available.
|
| * Citus only has cloud offerings on Azure, excluding the other
| two major players that often have exclusive relationships with
| companies.
|
| * Zedstore is really cool and together with Citus could be a
| massive gamechanger by having columnstore and rowstores in the
| same distributed database. However, development has stalled, and
| nobody seems to be able to explain what happened.
|
| Sigh...maybe 5 years from now.
| mfreed wrote:
| Timescale Cloud indeed comes with PostGIS installed by default.
|
| Regarding distributed (Citus) and columnar (Zedstore):
|
| - TimescaleDB's compression actually takes a columnar approach
| (including that it only reads the individual compressed columns
| that you SELECT), and so combines both row- and column-oriented
| data. [0]
|
| - TimescaleDB 2.0 also supports distributed deployment, and
| Timescale Cloud will (very soon) offer one-click deployment of
| fully-managed multi-node TimescaleDB. [1]
|
| [0] https://blog.timescale.com/blog/building-columnar-
| compressio...
|
| [1] https://blog.timescale.com/blog/building-a-distributed-
| time-...
| avthar wrote:
| > Timescale and Citus are are open core, holding back features
| for customers.
|
| One clarification. While TimescaleDB is open-core, our
| community version is source-available and 100% free to use. We
| do not "hold back features for customers". You do not need to
| pay to use any of TimescaleDB's best features, it's all free
| via the Timescale Community license.
|
| You only pay if you'd like to use our hosted offerings (and
| save the hassle of self-managing your DB): Timescale Cloud or
| Managed Service for TimescaleDB.
|
| For more see: https://www.timescale.com/products
|
| (Disclaimer: I work at Timescale)
| contrahax wrote:
| If you use Aiven for a cloud PG instance you can do both
| Timescale + PostGIS installed.
|
| I also really wish ClickHouse would prioritize PostGIS support
| - IIRC it has been on their roadmap for a while but keeps
| getting kicked around every year or so. Same thing with
| CockroachDB - PostGIS support kicked down the road every year.
| otan_ wrote:
| Hi there, CockroachDB dev here. We've supported spatial
| features from PostGIS since 20.2 - you can get started here
| https://www.cockroachlabs.com/docs/stable/spatial-data.html!
| Of course, there's bits and pieces we've missed but if
| there's something you're missing in particular you can let me
| know here or through a GitHub issue.
| sam0x17 wrote:
| Can someone give me a real-world example of a scenario where they
| actually need a time series database, like an example query with
| the business use case / justification? Just super curious.
| joshxyz wrote:
| - https://clickhouse.com/docs/en/faq/use-cases/time-series/
|
| - https://clickhouse.com/docs/en/faq/general/olap/
| jedberg wrote:
| At Netflix all of our monitoring was in a time series database
| so we could get real time insights into pretty much anything we
| were monitoring (which was most everything).
| ryanbooz wrote:
| (Post author)
|
| This is a great post to give you some talking points:
|
| https://blog.timescale.com/blog/what-the-heck-is-time-series...
|
| I also love this recent one we did with some non-standard time-
| series data that the NFL provided! Really fun working on that
| data set.
|
| https://blog.timescale.com/blog/hacking-nfl-data-with-postgr...
| LoriP wrote:
| Here's some posts from users on the Timescale blog
| https://blog.timescale.com/tag/dev-q-a/
| GordonS wrote:
| We're using TimescaleDB to store log events - so each row has a
| timestamp, some properties, and a log message. And a lot of
| that is actually in a JSONB column.
|
| Not the archetypal time series use case, but TimescaleDB is
| still really useful.
|
| TSDB's compression means we can store a huge volume of data in
| a fraction of the space of a standard Postgres table. You can
| achieve even better compression ratios and performance if you
| spend time designing your schema carefully, but honestly we
| didn't see the need, as just throwing data in gets us something
| like 10:1 compression and great performance.
|
| TSDB's chunked storage engine means that queries along chunking
| dimensions (e.g. timestamp) are super-fast, as it knows exactly
| which files to read.
|
| Chunking also means that data retention policies execute nearly
| instantaneously, as it's literally just deleting files from
| disk, rather than deleting rows one-by-one - millions of rows
| are gone in an instant!
|
| And best of all, this all works in Postgres, and we can query
| TSDB data just the same as regular data.
|
| All that combined easily justified the decision to use TSDB -
| and if you're familiar with Postgres, it's actually really
| simple to get started with. Really, we'd of needed a business
| justification _not_ to use it!
|
| Much love for the TimescaleDB team!
| sam0x17 wrote:
| So services like Sentry / Honeybadger / etc probably use this
| architecture?
| GordonS wrote:
| I think most online logging SaaS services actually use
| ElasticSearch.
| tnolet wrote:
| We just migrated to Clickhouse. We collect monitoring data. So
| response times from 20+ different location. We are not super
| duper big but at least 100M+ individual metrics per month. We
| want to give our users a snappy, interactive dashboard that
| lets them explore aggregates of that data over time: averages,
| p99 etc..
|
| That is where a time series DB is very handy
| bradstewart wrote:
| We capture and store energy readings at ~5second intervals,
| then display total energy at various time granularity by
| aggregating the values over minutes, hours, days, months, etc.
| ur-whale wrote:
| > Can someone give me a real-world example of a scenario where
| they actually need a time series database
|
| Large scale infrastructure monitoring?
|
| If you run a data center with 10K machines in it, whitebox
| monitoring of these machines and what runs on it generates tons
| of timestamped data.
|
| These time series can be used to inform an automated alerting
| system (eg using trends to forecast bad things before they
| happen).
|
| They can also be analyzed in batch mode to figure out how to
| optimize many things (power / cooling / workload assignment /
| etc ...)
| claytonjy wrote:
| I introduced Timescale at an IIOT company, we had thousands of
| sensors regularly sending data up and wanted to efficiently
| display such metrics to users on a per-sensor basis, and one
| "tick" of data had a lot of metrics. Timescale let us go from
| Postgres for metadata and OpenTSDB (awful, stay far away) for
| time series to just one Timescale instance for everything. Huge
| win for us. We had enough data that doing the same with vanilla
| Postgres would have performed much worse (billions of rows).
|
| We wrote more about this for an earlier Timescale blog post:
| https://blog.timescale.com/blog/how-everactive-powers-a-dens...
| nojvek wrote:
| I'm surprised Timescale hasn't given a comparison with
| SingleStoreDB.
|
| I've found SingleStore column scans at parity with ClickHouse in
| speed. At same time SingleStore uses a hybrid skip-list,
| columnstore data structure in their universal storage (which is
| default table format).
|
| So you have high throughput transactions, as well as insanely
| fast aggregate scans.
|
| Usually in column stores, they are great at append, not so much
| with updates and deletes.
| avinassh wrote:
| Related to TimescaleDB, there was a blog post which explained
| their internals and also compared with another similar time
| series DB. I can't seem to find the link, anyone remembers?
| avthar wrote:
| Timescaler here. I think you're referring to this comparison of
| InfluxDB vs TimescaleDB [0]?
|
| There's also comparisons of TimescaleDB vs MongoDB[1] and AWS
| Timestream [2].
|
| [0]: https://blog.timescale.com/blog/timescaledb-vs-influxdb-
| for-...
|
| [1]: https://blog.timescale.com/blog/how-to-store-time-series-
| dat...
|
| [2]:https://blog.timescale.com/blog/timescaledb-vs-amazon-
| timest...
| cercatrova wrote:
| Has anyone else been seeing an influx of timescale.com articles?
| I count around 10 in the last month.
| carlotasoto wrote:
| (Timescale Team member here)
|
| We've been working really hard on our launches / releases this
| month! We called it "Always Be Launching" - we've been aiming
| for releasing multiple things per week during October :)
| yumraj wrote:
| That sounds great.
|
| However, as a DB where users may store critical data, should
| you really be "Always be launching"? That sounds a little
| like FB's "move fast and break things". There's a reason why
| some of the mission critical open source technologies move
| slowly.
| mfreed wrote:
| We actually are only having one database software release
| this month (TimescaleDB v2.5), which is aligned with our
| normal database release cadence.
|
| Timescale (the company) also provides a managed cloud
| offering, as well as Promscale (an observability product
| built on top of TimescaleDB).
|
| So #AlwaysBeLaunching is a company-wide effort across
| different product & engineering teams, as well as folks in
| Developer Advocacy and others (e.g., who worked on this
| comparison benchmarks).
|
| What might be also interesting is our introduction of
| Experimental Schema features in TimescaleDB - explicitly so
| that we can "Move fast, but don't break things" (which is
| also key to getting good community feedback):
|
| https://blog.timescale.com/blog/move-fast-but-dont-break-
| thi...
|
| (Timescale co-founder)
| avthar wrote:
| Timescale team member here. We take our responsibility to
| build a rock-solid platform very seriously. We have
| multiple "levels" of product within Timescale. At our core,
| we have the open-source database, TimescaleDB. This product
| releases on a more deliberate and careful cadence, always
| making sure that we are optimizing for reliability,
| security, and performance. This has been our approach since
| our initial launch [0], where we embraced the mantra
| "boring is awesome", recognizing that for our users
| stability and reliability is of paramount importance.
|
| Within the core database, we offer features that are
| carefully marked as "experimental", which we discuss at
| length in this blog post [1].
|
| Beyond TimescaleDB, we also offer other products that are
| more SaaS-y in nature. While they're all based on the rock-
| solid foundation of TimescaleDB, we are also able to ship
| new features more quickly because they are UI components
| that make using the database even easier.
|
| Finally, some of our "launches" are more textual in nature,
| such as this benchmark, which we have spent months
| researching and compiling.
|
| [0]: https://blog.timescale.com/blog/when-boring-is-
| awesome-build...
|
| [1]: https://blog.timescale.com/blog/move-fast-but-dont-
| break-thi...
| pradeepchhetri wrote:
| One suggestion, if you really want to benchmark systems:
|
| * Create a setup which is production grade i.e. run a multi-node
| HA setup of those systems.
|
| * Understand the best practices of those systems otherwise result
| gets biased.
|
| * Validate the results with experts of those systems before
| publishing.
| didip wrote:
| Question, which helm chart is the best to install ClickHouse
| these days?
| hodgesrm wrote:
| Don't use helm. The ClickHouse Kubernetes Operator is the way
| to go. Here's the project:
| https://github.com/Altinity/clickhouse-operator
|
| This is generally true for most databases these days. Use an
| operator if it's available. Helm can't handle the dynamic
| management required to run databases properly.
| didip wrote:
| Thank you!
| [deleted]
| Upitor wrote:
| Would either of these database systems be proper for a case where
| you have a mix of large measurement data and small
| reference/master data that you need to join, filter, etc.
| ?Example:
|
| SELECT r.country, m.time, SUM(m.measurement) FROM
| measurement_table AS m INNER JOIN refence_table AS r ON
| m.device_id = r.device_id
| ryanbooz wrote:
| In it's current form/state, ClickHouse is not optimized for
| typical JOIN-type queries, a point we make in the post. You
| would have to re-write your statement to get better
| performance. The other main point is that all data is
| "immutable", so if your reference data needs to be updated, it
| would still need to go through some kind of asynchronous
| transform process to ensure you're getting the correct values
| at query time.
|
| TimescaleDB is PostgreSQL, so it can easily handle this kind of
| join aggregate like you would expect. If "m.measurement" was
| compressed, historical queries with a time predicate would
| likely be faster than uncompressed state.
| PeterZaitsev wrote:
| Altinity folks have suggested number of Clickhouse optimizations
| for time series benchmark did you enable any of those ?
| https://altinity.com/blog/clickhouse-continues-to-crush-time...
| ryanbooz wrote:
| Hello @PeterZaitsev!
|
| Actually Altinity is the one that contributed the bits to TSBS
| for benchmarking ClickHouse[1], so we are using the work that
| they contributed (and anyone is welcome to make a PR for
| updates or changes). We also had a former ClickHouse engineer
| look at the setup to verify it matched best practices with how
| CH is currently designed, given the TSBS dataset.
|
| As for the optimizations in the article you pointed to from
| 2019 (specifically how to query "last point" data more
| efficiently in ClickHouse), it uses a different table type
| (AggregatedMergeTree) and a materialized view to get better
| query response times for this query type.
|
| We (or someone in the community) could certainly add that
| optimization to the benchmark, but it wouldn't be using raw
| data - which we didn't think was appropriate for the benchmark
| analysis. But if one wanted to use that optimization, then one
| should also use Continuous Aggregates for TimescaleDB - ie for
| an apples to apples comparison - which I think would also lead
| to similar results to what we show today.
|
| It's actually something we've talked about adding to TSBS for
| TimescaleDB (as an option to turn on/off) and maybe other DBs
| could do the same.
|
| [1]: https://github.com/timescale/tsbs/pull/26
| PeterZaitsev wrote:
| Thank you for your prompt response!
|
| I think the most important thing is Clickhouse is NOT
| designed for small batch insertion, if you need to do 1000s
| of Inserts/sec you do queue in front of clickhouse. And query
| speed can be impacted by batch side a lot. So have you looked
| at query performance with optimal batch size ?
| mfreed wrote:
| Yep! The blog post includes data and graphs from both large
| (5000-15,000 rows / batch) and small (100-500 rows / batch)
| sizes. Please see the section "Insert Performance". Thanks!
|
| https://blog.timescale.com/blog/what-is-clickhouse-how-
| does-...
| PeterZaitsev wrote:
| 1) This is also small batch size. If you're inserting
| 500.000 rows/sec 5000 rows is not particularly large
| batch size
|
| 2) I see different graphs for ingest but not for queries.
| The data layout will depend on the batch size, unless of
| course you did OPTIMIZE before running queries
| ryanbooz wrote:
| 1) you're absolutely right. 5k rows isn't "large". We
| also mentioned that we did hundreds of tests often going
| between 5k and 15k rows/batch. The overall ingest/query
| cycle didn't change dramatically in any of these. That
| is, 5k rows was within a few percentage of 10k rows.
| Interestingly, the benchmarks that Altinity has, only
| used 10k rows/sec (which we also did, it just didn't have
| any major impact in the grand scheme of things).
|
| 2) We did not specifically call OPTIMIZE before running
| queries. Again, learning from the leaders at Altinity and
| their published benchmarks, I don't see any references
| that they did either, and neither does the TSBS code
| appear to call it after ingest.
|
| Happy to try both of these during our live stream next
| week to demonstrate and learn!
|
| Altinity benchmark (10k rows/batch mention):
| https://altinity.com/blog/clickhouse-for-time-series
| benwilson-512 wrote:
| We've got a few billion rows in TSDB, pretty happy with it so
| far. Our workload fits the OLTP workflow more than OLAP though,
| we're processing / analyzing individual data points from IoT
| devices as they come in, and then providing various
| visualizations. This tends to mean that we're doing lots of
| fetches to relatively small subsets of the data at a time, vs
| trying to compute summaries of large subsets.
|
| Compression is seriously impressive, we see ~90% compression rate
| on our real world datasets. Having that data right next to our
| regular postgres tables and being able to operate on it all
| transactionally definitely simplifies our application logic.
|
| Where I see a lot of folks run into issues with TimescaleDB is
| that it does require that your related data models hold on to
| relevant timestamps. If you want to query a hypertable
| efficiently, you always want to be able to specify the relevant
| time range so that it can ignore irrelevant chunks. This may mean
| that you need to put data_starts_at, data_ends_at columns on
| various other tables in your database to make sure you always
| know where you find your data. This is actually just fine though,
| because it also means you have an easy record of those min / max
| values on hand and don't need to hit the hypertable at all just
| to go "When did I last get data for this device".
| qorrect wrote:
| > Compression is seriously impressive
|
| Does this effect your query performance ?
| benwilson-512 wrote:
| In practice we've seen it actually improve performance,
| because when fetching a data range for a device fewer actual
| rows have to be fetched from the disk. You pick certain
| columns (like device ID) that remain uncompressed and indexed
| for rapid querying, and then the actual value columns are
| compressed for a range of time.
| qorrect wrote:
| Very cool thanks for sharing
|
| > This may mean that you need to put data_starts_at,
| data_ends_at columns on various other tables in your
| database to make sure you always know where you find your
| data.
|
| Do you have a link to docs for this ? Does this mean
| literally put a first column named (xstartx) and an end
| column (xendx) as the last column ? How do you then utilize
| it ?
|
| Thanks so much!
| nojito wrote:
| None of these clickhouse queries are optimized.
|
| It is very very hard to beat clickhouse in terms of performance
| if set up properly.
| fnord77 wrote:
| Apache Druid and Apache Pinot are two others to consider for time
| series. We're using druid at scale and it works pretty well.
| Pinot appears to be faster for retrieval but it is less mature.
| zekrioca wrote:
| I know it is not related, but "ClickHouse" ("_Click_stream" and
| "Data ware_House_") doesn't sound like a database name.
| LoriP wrote:
| I think the name originally came from company/project origin
| and how the whole thing started... Recently set up as their own
| entity, ClickHouse came from Yandex. There are probably others
| better able to give that history, but that's the gist of it.
| arunmu wrote:
| What is the difference w.r.t the comparison done by Altinity of
| clickhouse with timescale ? Clickhouse performed better there for
| the same test. What gives ?
| ryanbooz wrote:
| (Post author)
|
| The two big things, which we discuss at length in the post,
| are:
|
| - Altinity (and others) did not enable compression in
| TimescaleDB (which converts data into columnar storage) and
| provides improvement in querying historical data because it can
| retrieve individual columns in compressed format similar to CH
|
| - They didn't explore different batch sizes to help understand
| how each database is impacted at various batch sizes.
| PeterZaitsev wrote:
| Have you from your side followed all Clickhouse best
| practices?
|
| Clickhouse design in particular suggests doing ingest request
| approximately once per second and if you do much more than
| that when you use it outside of intended usage and if you
| need that you usually have some sort of queue between
| whatever produces the data and Clickhouse.
|
| Note the ingest in small batches also can significantly
| affect query performance
| ryanbooz wrote:
| Yep - it's all detailed in the post! The question is how it
| compares to TimescaleDB, which is an OLTP time-series
| database that has a lot of other possible use cases (and
| extensibility). I think it's very fair to explore how
| smaller batches work since others haven't ever actually
| shown that (as far as we can see) so that users that would
| normally be coming from a database like PostgreSQL can
| understand the impact something like small batches would
| have.
|
| As for ingest queueing, TSBS does not queue results. We
| agree, and tell most users that they should queue and batch
| insert in larger numbers. Not every app is designed that
| way and so we wanted to understand what that would look
| like.
|
| But CH did amazingly well regardless of that with batches
| above 1k-2k and lived up to it's name as a really fast
| database for ingest!
| akulkarni wrote:
| If you are referring to this post:
| https://altinity.com/blog/clickhouse-for-time-series
|
| That post was written in November 2018 - 3 years ago - when
| TimescaleDB was barely 1.0.
|
| A lot has changed since then:
|
| 1. TimescaleDB launched native columnar compression in 2019,
| which completely changed its story around storage footprint and
| query performance [0]
|
| 2. TimescaleDB has gotten much better
|
| 3. PostgreSQL has also gotten better (which in turn makes
| TimescaleDB better)
|
| In fact, IIRC Altinity used and contributed ClickHouse to the
| TSBS [1], which is also what this newer benchmark uses as well
|
| (Disclaimer: TimescaleDB co-founder)
|
| [0] https://blog.timescale.com/blog/building-columnar-
| compressio...
|
| [1] https://github.com/timescale/tsbs
| arunmu wrote:
| Thank you. My only nit is the way the ratio (CH/TS) is shown.
| What is the purpose of that ? It will show a bigger
| percentage for cases in which TS is better, but lower
| percentage for cases where CH is giving better results. From
| the data representation perspective, I do not thinnk that is
| fair.
| jurajmasar wrote:
| Disclaimer: I'm a co-founder of https://logtail.com, ClickHouse-
| based hosted log management platform.
|
| PostgreSQL, TimescaleDB, and ClickHouse are all impressive pieces
| of software. We use both PostgreSQL and ClickHouse at Logtail.
|
| ClickHouse shines for true OLAP use-cases and is _very_ hard to
| beat performance-wise when configured properly.
|
| Example:
|
| > Poor inserts and much higher disk usage (e.g., 2.7x higher disk
| usage than TimescaleDB) at small batch sizes (e.g., 100-300
| rows/batch).
|
| If your consistency requirements allow, you could use the Buffer
| Table Engine to get blazing fast inserts:
| https://clickhouse.com/docs/en/engines/table-engines/special...
|
| Horizontal scalability and compression are also unbeatable from
| what I've seen, to name a few.
|
| There's a hefty price tag, however: ClickHouse is quite ops heavy
| and its observability has a seriously steep learning curve. Only
| go for ClickHouse in production if you really know what you're
| doing :)
| MichaelRazum wrote:
| Could you explain what you mean by ops heavy? Just curious.
| Actually have a production system where Timescale as well as
| clickhouse are running in parallel. So far clickhouse didn't do
| any trouble, but it is rarely used right now.
| eternalban wrote:
| >> Disclaimer: I'm a co-founder of https://logtail.com,
| ClickHouse-based hosted log management platform.
| PeterZaitsev wrote:
| Benchmarks which were done a while back did not use compression
| for TimescaleDB but also did not use new compression settings for
| ClickHouse too.
|
| https://altinity.com/blog/2019/7/new-encodings-to-improve-cl...
|
| In particularly low_cardinality() for strings and time series
| specific compression many be very valuable
| mt42or wrote:
| Nobody talking about victoriametrics ?
___________________________________________________________________
(page generated 2021-10-21 23:00 UTC)