[HN Gopher] TimescaleDB 2.7 vs. PostgreSQL 14
___________________________________________________________________
TimescaleDB 2.7 vs. PostgreSQL 14
Author : carlotasoto
Score : 99 points
Date : 2022-09-22 15:54 UTC (7 hours ago)
(HTM) web link (www.timescale.com)
(TXT) w3m dump (www.timescale.com)
| boomskats wrote:
| Whenever I see these posts from TimescaleDB, I always want to ask
| them how it compares in performance to alternative extensions
| that implement the same features, rather than just comparing
| TimescaleDB to vanilla PostgreSQL.
|
| For example, they mention their automated data retention and how
| it's achieved with one SQL command, and how DELETEing records is
| a very costly operation, and how "even if you were using Postgres
| declarative partitioning you'd still need to automate the process
| yourself, wasting precious developer time, adding additional
| requirements, and implementing bespoke code that needs to be
| supported moving forward".
|
| There's zero mention anywhere of pg_partman, which does all of
| these things for you equally as simply, and is a fully OSS free
| alternative [0].
|
| I get that it's a PG extension that competes with their product.
| I know that TimescaleDB does a few other things that pg_partman
| does not. But I can't help but find its (seemingly) purposeful
| omission in these, otherwise very thorough blog posts,
| misleading.
|
| [0]
| https://github.com/pgpartman/pg_partman/blob/master/doc/pg_p...
| bbirk wrote:
| Sometimes when benchmarks are posted on HN that compares your
| own product (A) with a competitor's product (B), someone
| working for the competitor refutes the claims made in the blog
| and states that B were not given the same requirements as A.
| This particular thread comes to mind [1]. Maybe this is due to
| the blog writers misunderstanding B or maybe the competitor
| misunderstands A, or maybe there is a better way to do what was
| benchmarked using B. I think a better way would be to use some
| standard benchmark, and publish the code used for benchmarking
| them.
|
| [1] https://news.ycombinator.com/item?id=31767858
| djk447 wrote:
| Disclosure, I work at Timescale.
|
| Though I didn't write this post, I'd imagine at least part of
| it is that it's already nearly 4000 words and a 15 minute read
| and we just didn't want to add another set of things to it, to
| be perfectly honest.
|
| `pg_partman` is cool! I haven't used it in a while, and because
| it uses declarative partitioning, it has some locking issues
| that we address with our partitioning scheme, but implying that
| it is OSS and we're not in terms of things like data retention
| features is a bit misleading as well. The `drop_chunks` command
| used for data retention is in the Apache 2 licensed portion of
| Timescale.
| ensignavenger wrote:
| But almost all of your posts and benchmarks are based on the
| closed source version of Timescale. Everywhere I have seen it
| is always recommend to use the closed source version to get
| decent performance out of it.
| akulkarni wrote:
| (Timescale co-founder)
|
| Just to clarify: Nothing on Timescale is closed-source. It
| is all source available, all on Github. Some of it is
| Apache2 licensed, some of it is Timescale Licensed. And it
| is all free.
| ensignavenger wrote:
| Since the Timescale License is not an open source
| license, it is a closed source license. You are right, it
| is source available, but source available is also closed
| source. It is closed because it is not open. And it might
| be free as in beer, but it is not free as in freedom.
| doctor_eval wrote:
| My recollection is that the TS license simply has
| protection against using the TS code to compete with TS,
| ala Amazon RDS.
|
| While some people on HN feel that this is an impurity
| they can't live with, I personally think it's a small
| price to pay to enable development of TS to continue. In
| my opinion, claiming that it's closed source is somewhat
| dogmatic. Many open source licenses have some kind of
| restrictions on use; the GPL comes to mind.
| skeletal88 wrote:
| The source code is available, when someone says something
| is closed source then it usually means that the source
| code is not publicly available.
|
| Do you want amazing things? Everything can't be "free as
| in beer" wtf does that even mean, i don't get free beer
| from anywhere.
| ensignavenger wrote:
| It is a phrase that Richard Stallman created, and well
| known in free software communities-
| https://www.wired.com/2006/09/free-as-in-beer/,
| https://www.gnu.org/philosophy/free-sw.html
| teraflop wrote:
| "Open source" and "closed source" are not the only
| options. There are plenty of products out there where
| you're technically allowed to look at the source code,
| but very restricted in how you can legally use it. The
| "open" in "open source" is generally understood to mean
| that users have _permission_ to use, modify and
| redistribute the software. (Without that permission,
| calling it "freeware", "shared source" or "source
| available" would be more accurate.)
|
| That's what "free as in beer" means -- it's a well-
| established phrase meaning "zero monetary cost":
| https://en.wikipedia.org/wiki/Gratis_versus_libre
|
| In the case of the non-Apache-licensed version of
| TimescaleDB, you're allowed to use the software without
| payment, and you can distribute unmodified copies. But
| you're essentially forbidden from letting users define
| their own schemas, or from modifying it or reusing
| components unless your modified version imposes that same
| restriction. (The exception is if you agree to transfer
| ownership of your changes back to Timescale.)
|
| Nobody's saying that Timescale can't build a non-open-
| source database, only that they should be clear about
| which parts are actually open. In my opinion, describing
| it on the homepage as an "open-source relational
| database" and then promoting it by benchmarking the
| proprietary version is at least a little bit misleading.
| fzliu wrote:
| Seconded. I understand that Timescale extends Postgres, but I'd
| still like to see performance comparisons relative to other
| time-series databases such as InfluxDB or even TiDB.
|
| Shameless self-plug: we're also building database technology,
| but for embedding vectors (https://milvus.io) rather than time-
| series data - when doing query performance comparisons, we get
| several orders of magnitude of performance improvement over
| traditional databases. It's an unfair comparison, so we
| generally avoid it.
| ebiester wrote:
| I think it's fair to mention it, but with the specific caveat
| of "If your workload has these characteristics, we are a
| particularly good fit."
| akulkarni wrote:
| [Timescale co-founder]
|
| We have published many, many benchmarks versus other database
| systems. IIRC all of them also made the front page of
| HackerNews.
|
| Here are some of them, for your reading pleasure :-)
|
| TimescaleDB vs. InfluxDB:
| https://www.timescale.com/blog/timescaledb-vs-influxdb-
| for-t...
|
| TimescaleDB vs. ClickHouse:
| https://www.timescale.com/blog/what-is-clickhouse-how-
| does-i...
|
| TimescaleDB vs. Timestream:
| https://www.timescale.com/blog/timescaledb-vs-amazon-
| timestr...
| djbusby wrote:
| Cool. Now do VS PG+extensions like asked about up-thread.
| akulkarni wrote:
| (Timescale co-founder)
|
| That's a fair question.
|
| We find that most developers storing time-series data on
| Postgres are doing so without pg_partman. So we first wanted to
| provide a benchmark that would be useful to most developers.
|
| This benchmark was also the result of months of dedicated work.
| So the team did spend a lot of time on this. Unfortunately,
| they ran out of time to cover pg_partman. But that comparison
| is on our list of things to do soon.
|
| Thanks!
| avar wrote:
| Most people storing time-series are also doing that outside
| of TimescaleDB.
|
| So the relevant question isn't what a typical PostgreSQL user
| is doing, but whether someone wanting to optimize their
| storage should look at a PostgreSQL extension, or an upstart
| database like TimescaleDB.
| djk447 wrote:
| NB: I work at Timescale.
|
| TimescaleDB is a PostgreSQL extension, just to be clear.
| brightball wrote:
| Same. I've been experimenting with the recently fully open
| sourced Citus 11 and it's pretty incredible. To the point that
| I have a hard time imagining NOT using it going forward.
|
| Maybe I need to do my own comparison at some point.
| didip wrote:
| What about TimescaleDB vs a lot of other OLAP databases like
| ClickHouse, Druid, Pinot, Databend, etc.?
|
| Or hybrid databases like StarRocks or TiDB?
| lgl wrote:
| Not sure about the other ones, but they do have a comparison
| against ClickHouse from October 2021 that's available at:
|
| https://www.timescale.com/blog/what-is-clickhouse-how-does-i...
| doliveira wrote:
| As a sidenote, Databend's elevator pitch sounds quite good:
| from what I've seen there's quite a lot of complexity in
| managing co-located storage and compute like ClickHouse, Druid
| and Pinot. Leveraging object storage instead seems like a good
| alternative.
|
| Do you have some experience with it?
| didip wrote:
| Unfortunately not yet, but I am actively investigating.
| AdamProut wrote:
| This is a reasonable benchmark from the Clickhouse folks for
| single table anlaytical query performance over smallish data
| sets (10s of GB of data). Most of the DW vendors are on there.
|
| https://benchmark.clickhouse.com/
|
| Timescale apparently lags pretty far behind modern columnstore
| engines.
| akulkarni wrote:
| (Timescale co-founder)
|
| As with anything, it depends on what you want to do.
|
| If you have an OLAP heavy workload with long scans, etc
| (which is the type of queries prominent on the ClickHouse
| page - e.g., Q0 is "SELECT COUNT(*) FROM hits;"), then I
| would highly recommend systems other than Timescale.
| (Although we are also working on this ;-) )
|
| But if you have time-series workload, or even, if you love
| Postgres and are building a time-series and/or analytical
| application, then I would recommend TimescaleDB.
|
| ClickHouse is great. I just believe in using the right tool
| for the right job. :-) There are many areas where column
| store engines beat TimescaleDB. But nothing comes for free -
| everything has a tradeoff.
| didip wrote:
| Genuinely curious, aren't a lot of time-series workloads
| OLAP oriented?
| srcreigh wrote:
| > For example, one recent query planner improvement excludes data
| more efficiently for relative now()-based queries (e.g., WHERE
| time >= now()-'1 week'::interval).
|
| This could be done by just calculating the start date in code
| too.
|
| > When hypertables are compressed the amount of data that queries
| need to read is reduced, leading to dramatic increases in
| performance of 1000x or more.
|
| At my workplace we recently experimented with storing time series
| data in an array in a postgres row. This gets compressed as a
| TOAST array, can store thousands of ints in just a few DB pages
| (aka loading it is about the same as an index scan). We also use
| Timescale for a different service mind you. I'm sure this format
| is more efficient than the Timescale format too. In Timescale you
| would need rows containing (for example) (date, user_id,
| time_on_site), one row per day. The postgres array format
| (start_date, user_id, time_on_site_by_date) indexing where
| 0=start_date, 1=start_date+1 is like 1/3rd the size uncompressed.
| And yea, even if something is compressed, you still gotta put the
| uncompressed version in memory somewhere.
| djk447 wrote:
| Disclosure, I work at Timescale.
|
| We are kindred spirits I think! I did this too [0] a while back
| at a previous company and it actually served as part of the
| inspiration for our compression work! It's fun, but a bit
| difficult to query at times. Our compressed columns do also get
| TOASTed and stored out of line.
|
| I'm not sure that it's going to be much more efficient than the
| Timescale format once it's compressed, we have some pretty good
| compression algos, but I might be missing something about your
| case, we generally can achieve close to 10x compression, but
| right now you can't write directly compressed data, so you
| would save on the write side I suppose.
|
| It is true that you need to put the uncompressed version into
| memory at some point, but we do try to limit that and in many
| cases you end up IO limited moreso than memory limited. We're
| also thinking about doing some work to push processing down
| towards the compressed data, but that's still in the "glint in
| our eye" stage, but I think it has a lot of promise.
|
| (As a side note, TOAST is still the best acronym around ;) ).
|
| [0]: https://www.youtube.com/watch?v=sPoz1OPuRUU
| miohtama wrote:
| If I am running already compressed filesystem like ZFS with
| zstd, can I disable TOAST and compressed columns altogether
| somehow?
|
| https://dba.stackexchange.com/questions/315063/disable-
| toast...
| chrisco255 wrote:
| Should be noted that article is titled Timescale + Postgres vs.
| Postgres alone. Timescale is built on Postgres so it's not really
| a competition per se.
| akulkarni wrote:
| (Timescale co-founder)
|
| Yes, 100%. We deliberately choose the "+" symbol instead of
| "vs." for this blog title. We love PostgreSQL. :-)
| g8oz wrote:
| Right, the "vs" gives a much different and frankly misleading
| impression of the article than the "+".
| Helmut10001 wrote:
| Yes, I expected something like InfluxDB 2.0 vs. Postgres.
| ithrow wrote:
| Is Timescale suited for keeping the history of the data and then
| query against any point in time?
| jxi wrote:
| krn wrote:
| If integration with PostgreSQL is not a requirement, I would
| suggest looking at QuestDB[1] for the best performing open-source
| timeseries database currently on the market.
|
| [1] https://questdb.io/blog/2021/07/05/comparing-questdb-
| timesca...
| tda wrote:
| Are you sure it is the best performing? The article only
| compares to influx and timescale, which are very far from the
| fastest timeseries databases out there. How does it compare to
| clickhouse? I found this
| https://questdb.io/blog/2022/05/26/query-benchmark-
| questdb-v..., but the numbers are so far off (and clickhouse is
| much slower than timescale in their examples) that I kind of
| struggle to believe them.
| krn wrote:
| Clickhouse is a data warehouse, not a timeseries database.
| Timeseries databases are designed with different bottlenecks
| in the mind:
|
| https://questdb.io/blog/2020/11/26/why-timeseries-data/
| alecco wrote:
| How is this on HN frontpage?
| PeterZaitsev wrote:
| I wonder in those benchmarks are we speaking about TimescaleDB
| Apache Edition or non OSS TimescaleDB Community edition ?
|
| Looks like these are quite different in features
| https://docs.timescale.com/timescaledb/latest/timescaledb-ed...
| nextaccountic wrote:
| Hey, about the column-wide data compression thing. If I have a
| column with strings that are highly similar to each other (maybe
| they are urls, maybe they are HTML documents that were scraped
| from the same site, but they are expected to contain
| redundancies)
|
| What's the best approach to get excellent compression? Can I
| exploit this redundancy within the column somehow? I don't even
| know the right search terms for this.
|
| TimescaleDB compression will work only for integer columns,
| right? And in any case, this is not a time series.
| jxi wrote:
| znpy wrote:
| No mention of ha in the documentation.
|
| I can't understand whether HA would rely on the standard
| postgresql tooling or if you have to pay for some kind of
| enterprise license to get it.
| zokier wrote:
| https://docs.timescale.com/timescaledb/latest/how-to-guides/...
| maierru wrote:
| tldr "readonly" database :(
|
| Update/delete of current chunk data blocked during compression of
| old data https://github.com/timescale/timescaledb/issues/4432
___________________________________________________________________
(page generated 2022-09-22 23:02 UTC)