[HN Gopher] TimescaleDB 2.7 vs. PostgreSQL 14
       ___________________________________________________________________
        
       TimescaleDB 2.7 vs. PostgreSQL 14
        
       Author : carlotasoto
       Score  : 99 points
       Date   : 2022-09-22 15:54 UTC (7 hours ago)
        
 (HTM) web link (www.timescale.com)
 (TXT) w3m dump (www.timescale.com)
        
       | boomskats wrote:
       | Whenever I see these posts from TimescaleDB, I always want to ask
       | them how it compares in performance to alternative extensions
       | that implement the same features, rather than just comparing
       | TimescaleDB to vanilla PostgreSQL.
       | 
       | For example, they mention their automated data retention and how
       | it's achieved with one SQL command, and how DELETEing records is
       | a very costly operation, and how "even if you were using Postgres
       | declarative partitioning you'd still need to automate the process
       | yourself, wasting precious developer time, adding additional
       | requirements, and implementing bespoke code that needs to be
       | supported moving forward".
       | 
       | There's zero mention anywhere of pg_partman, which does all of
       | these things for you equally as simply, and is a fully OSS free
       | alternative [0].
       | 
       | I get that it's a PG extension that competes with their product.
       | I know that TimescaleDB does a few other things that pg_partman
       | does not. But I can't help but find its (seemingly) purposeful
       | omission in these, otherwise very thorough blog posts,
       | misleading.
       | 
       | [0]
       | https://github.com/pgpartman/pg_partman/blob/master/doc/pg_p...
        
         | bbirk wrote:
         | Sometimes when benchmarks are posted on HN that compares your
         | own product (A) with a competitor's product (B), someone
         | working for the competitor refutes the claims made in the blog
         | and states that B were not given the same requirements as A.
         | This particular thread comes to mind [1]. Maybe this is due to
         | the blog writers misunderstanding B or maybe the competitor
         | misunderstands A, or maybe there is a better way to do what was
         | benchmarked using B. I think a better way would be to use some
         | standard benchmark, and publish the code used for benchmarking
         | them.
         | 
         | [1] https://news.ycombinator.com/item?id=31767858
        
         | djk447 wrote:
         | Disclosure, I work at Timescale.
         | 
         | Though I didn't write this post, I'd imagine at least part of
         | it is that it's already nearly 4000 words and a 15 minute read
         | and we just didn't want to add another set of things to it, to
         | be perfectly honest.
         | 
         | `pg_partman` is cool! I haven't used it in a while, and because
         | it uses declarative partitioning, it has some locking issues
         | that we address with our partitioning scheme, but implying that
         | it is OSS and we're not in terms of things like data retention
         | features is a bit misleading as well. The `drop_chunks` command
         | used for data retention is in the Apache 2 licensed portion of
         | Timescale.
        
           | ensignavenger wrote:
           | But almost all of your posts and benchmarks are based on the
           | closed source version of Timescale. Everywhere I have seen it
           | is always recommend to use the closed source version to get
           | decent performance out of it.
        
             | akulkarni wrote:
             | (Timescale co-founder)
             | 
             | Just to clarify: Nothing on Timescale is closed-source. It
             | is all source available, all on Github. Some of it is
             | Apache2 licensed, some of it is Timescale Licensed. And it
             | is all free.
        
               | ensignavenger wrote:
               | Since the Timescale License is not an open source
               | license, it is a closed source license. You are right, it
               | is source available, but source available is also closed
               | source. It is closed because it is not open. And it might
               | be free as in beer, but it is not free as in freedom.
        
               | doctor_eval wrote:
               | My recollection is that the TS license simply has
               | protection against using the TS code to compete with TS,
               | ala Amazon RDS.
               | 
               | While some people on HN feel that this is an impurity
               | they can't live with, I personally think it's a small
               | price to pay to enable development of TS to continue. In
               | my opinion, claiming that it's closed source is somewhat
               | dogmatic. Many open source licenses have some kind of
               | restrictions on use; the GPL comes to mind.
        
               | skeletal88 wrote:
               | The source code is available, when someone says something
               | is closed source then it usually means that the source
               | code is not publicly available.
               | 
               | Do you want amazing things? Everything can't be "free as
               | in beer" wtf does that even mean, i don't get free beer
               | from anywhere.
        
               | ensignavenger wrote:
               | It is a phrase that Richard Stallman created, and well
               | known in free software communities-
               | https://www.wired.com/2006/09/free-as-in-beer/,
               | https://www.gnu.org/philosophy/free-sw.html
        
               | teraflop wrote:
               | "Open source" and "closed source" are not the only
               | options. There are plenty of products out there where
               | you're technically allowed to look at the source code,
               | but very restricted in how you can legally use it. The
               | "open" in "open source" is generally understood to mean
               | that users have _permission_ to use, modify and
               | redistribute the software. (Without that permission,
               | calling it  "freeware", "shared source" or "source
               | available" would be more accurate.)
               | 
               | That's what "free as in beer" means -- it's a well-
               | established phrase meaning "zero monetary cost":
               | https://en.wikipedia.org/wiki/Gratis_versus_libre
               | 
               | In the case of the non-Apache-licensed version of
               | TimescaleDB, you're allowed to use the software without
               | payment, and you can distribute unmodified copies. But
               | you're essentially forbidden from letting users define
               | their own schemas, or from modifying it or reusing
               | components unless your modified version imposes that same
               | restriction. (The exception is if you agree to transfer
               | ownership of your changes back to Timescale.)
               | 
               | Nobody's saying that Timescale can't build a non-open-
               | source database, only that they should be clear about
               | which parts are actually open. In my opinion, describing
               | it on the homepage as an "open-source relational
               | database" and then promoting it by benchmarking the
               | proprietary version is at least a little bit misleading.
        
         | fzliu wrote:
         | Seconded. I understand that Timescale extends Postgres, but I'd
         | still like to see performance comparisons relative to other
         | time-series databases such as InfluxDB or even TiDB.
         | 
         | Shameless self-plug: we're also building database technology,
         | but for embedding vectors (https://milvus.io) rather than time-
         | series data - when doing query performance comparisons, we get
         | several orders of magnitude of performance improvement over
         | traditional databases. It's an unfair comparison, so we
         | generally avoid it.
        
           | ebiester wrote:
           | I think it's fair to mention it, but with the specific caveat
           | of "If your workload has these characteristics, we are a
           | particularly good fit."
        
           | akulkarni wrote:
           | [Timescale co-founder]
           | 
           | We have published many, many benchmarks versus other database
           | systems. IIRC all of them also made the front page of
           | HackerNews.
           | 
           | Here are some of them, for your reading pleasure :-)
           | 
           | TimescaleDB vs. InfluxDB:
           | https://www.timescale.com/blog/timescaledb-vs-influxdb-
           | for-t...
           | 
           | TimescaleDB vs. ClickHouse:
           | https://www.timescale.com/blog/what-is-clickhouse-how-
           | does-i...
           | 
           | TimescaleDB vs. Timestream:
           | https://www.timescale.com/blog/timescaledb-vs-amazon-
           | timestr...
        
             | djbusby wrote:
             | Cool. Now do VS PG+extensions like asked about up-thread.
        
         | akulkarni wrote:
         | (Timescale co-founder)
         | 
         | That's a fair question.
         | 
         | We find that most developers storing time-series data on
         | Postgres are doing so without pg_partman. So we first wanted to
         | provide a benchmark that would be useful to most developers.
         | 
         | This benchmark was also the result of months of dedicated work.
         | So the team did spend a lot of time on this. Unfortunately,
         | they ran out of time to cover pg_partman. But that comparison
         | is on our list of things to do soon.
         | 
         | Thanks!
        
           | avar wrote:
           | Most people storing time-series are also doing that outside
           | of TimescaleDB.
           | 
           | So the relevant question isn't what a typical PostgreSQL user
           | is doing, but whether someone wanting to optimize their
           | storage should look at a PostgreSQL extension, or an upstart
           | database like TimescaleDB.
        
             | djk447 wrote:
             | NB: I work at Timescale.
             | 
             | TimescaleDB is a PostgreSQL extension, just to be clear.
        
         | brightball wrote:
         | Same. I've been experimenting with the recently fully open
         | sourced Citus 11 and it's pretty incredible. To the point that
         | I have a hard time imagining NOT using it going forward.
         | 
         | Maybe I need to do my own comparison at some point.
        
       | didip wrote:
       | What about TimescaleDB vs a lot of other OLAP databases like
       | ClickHouse, Druid, Pinot, Databend, etc.?
       | 
       | Or hybrid databases like StarRocks or TiDB?
        
         | lgl wrote:
         | Not sure about the other ones, but they do have a comparison
         | against ClickHouse from October 2021 that's available at:
         | 
         | https://www.timescale.com/blog/what-is-clickhouse-how-does-i...
        
         | doliveira wrote:
         | As a sidenote, Databend's elevator pitch sounds quite good:
         | from what I've seen there's quite a lot of complexity in
         | managing co-located storage and compute like ClickHouse, Druid
         | and Pinot. Leveraging object storage instead seems like a good
         | alternative.
         | 
         | Do you have some experience with it?
        
           | didip wrote:
           | Unfortunately not yet, but I am actively investigating.
        
         | AdamProut wrote:
         | This is a reasonable benchmark from the Clickhouse folks for
         | single table anlaytical query performance over smallish data
         | sets (10s of GB of data). Most of the DW vendors are on there.
         | 
         | https://benchmark.clickhouse.com/
         | 
         | Timescale apparently lags pretty far behind modern columnstore
         | engines.
        
           | akulkarni wrote:
           | (Timescale co-founder)
           | 
           | As with anything, it depends on what you want to do.
           | 
           | If you have an OLAP heavy workload with long scans, etc
           | (which is the type of queries prominent on the ClickHouse
           | page - e.g., Q0 is "SELECT COUNT(*) FROM hits;"), then I
           | would highly recommend systems other than Timescale.
           | (Although we are also working on this ;-) )
           | 
           | But if you have time-series workload, or even, if you love
           | Postgres and are building a time-series and/or analytical
           | application, then I would recommend TimescaleDB.
           | 
           | ClickHouse is great. I just believe in using the right tool
           | for the right job. :-) There are many areas where column
           | store engines beat TimescaleDB. But nothing comes for free -
           | everything has a tradeoff.
        
             | didip wrote:
             | Genuinely curious, aren't a lot of time-series workloads
             | OLAP oriented?
        
       | srcreigh wrote:
       | > For example, one recent query planner improvement excludes data
       | more efficiently for relative now()-based queries (e.g., WHERE
       | time >= now()-'1 week'::interval).
       | 
       | This could be done by just calculating the start date in code
       | too.
       | 
       | > When hypertables are compressed the amount of data that queries
       | need to read is reduced, leading to dramatic increases in
       | performance of 1000x or more.
       | 
       | At my workplace we recently experimented with storing time series
       | data in an array in a postgres row. This gets compressed as a
       | TOAST array, can store thousands of ints in just a few DB pages
       | (aka loading it is about the same as an index scan). We also use
       | Timescale for a different service mind you. I'm sure this format
       | is more efficient than the Timescale format too. In Timescale you
       | would need rows containing (for example) (date, user_id,
       | time_on_site), one row per day. The postgres array format
       | (start_date, user_id, time_on_site_by_date) indexing where
       | 0=start_date, 1=start_date+1 is like 1/3rd the size uncompressed.
       | And yea, even if something is compressed, you still gotta put the
       | uncompressed version in memory somewhere.
        
         | djk447 wrote:
         | Disclosure, I work at Timescale.
         | 
         | We are kindred spirits I think! I did this too [0] a while back
         | at a previous company and it actually served as part of the
         | inspiration for our compression work! It's fun, but a bit
         | difficult to query at times. Our compressed columns do also get
         | TOASTed and stored out of line.
         | 
         | I'm not sure that it's going to be much more efficient than the
         | Timescale format once it's compressed, we have some pretty good
         | compression algos, but I might be missing something about your
         | case, we generally can achieve close to 10x compression, but
         | right now you can't write directly compressed data, so you
         | would save on the write side I suppose.
         | 
         | It is true that you need to put the uncompressed version into
         | memory at some point, but we do try to limit that and in many
         | cases you end up IO limited moreso than memory limited. We're
         | also thinking about doing some work to push processing down
         | towards the compressed data, but that's still in the "glint in
         | our eye" stage, but I think it has a lot of promise.
         | 
         | (As a side note, TOAST is still the best acronym around ;) ).
         | 
         | [0]: https://www.youtube.com/watch?v=sPoz1OPuRUU
        
           | miohtama wrote:
           | If I am running already compressed filesystem like ZFS with
           | zstd, can I disable TOAST and compressed columns altogether
           | somehow?
           | 
           | https://dba.stackexchange.com/questions/315063/disable-
           | toast...
        
       | chrisco255 wrote:
       | Should be noted that article is titled Timescale + Postgres vs.
       | Postgres alone. Timescale is built on Postgres so it's not really
       | a competition per se.
        
         | akulkarni wrote:
         | (Timescale co-founder)
         | 
         | Yes, 100%. We deliberately choose the "+" symbol instead of
         | "vs." for this blog title. We love PostgreSQL. :-)
        
         | g8oz wrote:
         | Right, the "vs" gives a much different and frankly misleading
         | impression of the article than the "+".
        
         | Helmut10001 wrote:
         | Yes, I expected something like InfluxDB 2.0 vs. Postgres.
        
       | ithrow wrote:
       | Is Timescale suited for keeping the history of the data and then
       | query against any point in time?
        
         | jxi wrote:
        
       | krn wrote:
       | If integration with PostgreSQL is not a requirement, I would
       | suggest looking at QuestDB[1] for the best performing open-source
       | timeseries database currently on the market.
       | 
       | [1] https://questdb.io/blog/2021/07/05/comparing-questdb-
       | timesca...
        
         | tda wrote:
         | Are you sure it is the best performing? The article only
         | compares to influx and timescale, which are very far from the
         | fastest timeseries databases out there. How does it compare to
         | clickhouse? I found this
         | https://questdb.io/blog/2022/05/26/query-benchmark-
         | questdb-v..., but the numbers are so far off (and clickhouse is
         | much slower than timescale in their examples) that I kind of
         | struggle to believe them.
        
           | krn wrote:
           | Clickhouse is a data warehouse, not a timeseries database.
           | Timeseries databases are designed with different bottlenecks
           | in the mind:
           | 
           | https://questdb.io/blog/2020/11/26/why-timeseries-data/
        
       | alecco wrote:
       | How is this on HN frontpage?
        
       | PeterZaitsev wrote:
       | I wonder in those benchmarks are we speaking about TimescaleDB
       | Apache Edition or non OSS TimescaleDB Community edition ?
       | 
       | Looks like these are quite different in features
       | https://docs.timescale.com/timescaledb/latest/timescaledb-ed...
        
       | nextaccountic wrote:
       | Hey, about the column-wide data compression thing. If I have a
       | column with strings that are highly similar to each other (maybe
       | they are urls, maybe they are HTML documents that were scraped
       | from the same site, but they are expected to contain
       | redundancies)
       | 
       | What's the best approach to get excellent compression? Can I
       | exploit this redundancy within the column somehow? I don't even
       | know the right search terms for this.
       | 
       | TimescaleDB compression will work only for integer columns,
       | right? And in any case, this is not a time series.
        
       | jxi wrote:
        
       | znpy wrote:
       | No mention of ha in the documentation.
       | 
       | I can't understand whether HA would rely on the standard
       | postgresql tooling or if you have to pay for some kind of
       | enterprise license to get it.
        
         | zokier wrote:
         | https://docs.timescale.com/timescaledb/latest/how-to-guides/...
        
       | maierru wrote:
       | tldr "readonly" database :(
       | 
       | Update/delete of current chunk data blocked during compression of
       | old data https://github.com/timescale/timescaledb/issues/4432
        
       ___________________________________________________________________
       (page generated 2022-09-22 23:02 UTC)