[HN Gopher] pg_timeseries: Open-source time-series extension for...
       ___________________________________________________________________
        
       pg_timeseries: Open-source time-series extension for PostgreSQL
        
       Author : samaysharma
       Score  : 270 points
       Date   : 2024-05-20 16:44 UTC (19 hours ago)
        
 (HTM) web link (tembo.io)
 (TXT) w3m dump (tembo.io)
        
       | vantiro wrote:
       | PostgreSQL licensed, good move!
        
       | gxyt6gfy5t wrote:
       | How's it different than timescaledb?
        
         | logrot wrote:
         | > You may already be asking: "why not just power the stack
         | using TimescaleDB?" The Timescale License would restrict our
         | use of features such as compression, incremental materialized
         | views, and bottomless storage. With these missing, we felt that
         | what remained would not provide an adequate basis for our
         | customers' time-series needs. Therefore, we decided to build
         | our own PostgreSQL-licensed extension.
        
           | samaysharma wrote:
           | It's much newer for one, so it's behind on features. But,
           | we're working on adding new ones based on customer demand.
           | 
           | We want to build it with a PostgreSQL license using existing
           | community extensions as much as possible and build custom
           | stuff (still permissively licensed) only when necessary.
        
         | vantiro wrote:
         | Timescaledb's license is more like Redis' new license?
        
         | hosh wrote:
         | The use of postgresql licensing might mean we can see this
         | available for AWS RDS and other managed PostgreSQL providers.
        
       | riedel wrote:
       | >You may already be asking: "why not just power the stack using
       | TimescaleDB?" The Timescale License would restrict our use of
       | features such as compression, incremental materialized views, and
       | bottomless storage. With these missing, we felt that what
       | remained would not provide an adequate basis for our customers'
       | time-series needs. Therefore, we decided to build our own
       | PostgreSQL-licensed extension.
       | 
       | Have been using the free version timescaledb before to shard a
       | 500 Million observation time series database. Worked drop-in
       | without much hassle. Would have expected some benchmarks and
       | comparisons in the post. I will for sure watch this...
        
         | osigurdson wrote:
         | 500 million is very little however. A regular table with a
         | covering index would probably be fine for many use cases with
         | this number of points.
        
           | vegabook wrote:
           | indeed. Financial timeseries I was working with over 100
           | million new points, _per day_. For anything serious
           | TimescaleDB is essentially not open source. Well done
           | tembo.io crew -- will definitely give this a whirl.
        
             | eyegor wrote:
             | What do you mean by "for anything serious it isn't open
             | source"? I didn't see any red flags in the apache variant
             | of timescale, just constant pleading to try their hosted
             | option.
             | 
             | https://github.com/timescale/timescaledb/blob/main/LICENSE-
             | A...
        
               | pmeira wrote:
               | Compression and other features use the non-Apache
               | license:
               | 
               | https://github.com/timescale/timescaledb/tree/main/tsl
        
               | NineStarPoint wrote:
               | And as I understand that license, you are allowed to use
               | Timescale for anything that doesn't involve offering
               | Timescale itself as a service. If you were using
               | Timescale to process lots of time series transactions in
               | your backend, it doesn't seem to me like that would break
               | the license.
               | 
               | (Which is to say that if, like Tembo, you're offering
               | Postgres as a service you do indeed have a problem. But
               | for other use, should be fine)
        
               | dig1 wrote:
               | The tricky thing with these licenses (BSL, SSPL, etc.) is
               | that you can use them freely for internal stuff, but
               | suddenly, if you make your product public (assuming it
               | uses, e.g., TimescaleDB), things can get muddy. Everyone
               | wants the flexibility to either open-source or
               | commercialize a successful internal product in the
               | future.
               | 
               | The problem is that, even if your app is not a mere
               | frontend for TimescaleDB/Mongo/Redis, you can get sued,
               | and you'll have to spend unnecessary time and money
               | proving things in court. No one wants this, especially a
               | startup owner whose money and time are tight. Also, even
               | if your startup/company uses some of these techs,
               | potential company buyers will be very wary of the
               | purchase if they know they'll have to deal with this
               | later.
        
               | miohtama wrote:
               | I would assume TimescaleDb only sues if you money. In
               | this case you can also afford a commercial license. If
               | you hit big just contact them and tell there was a
               | problem having a correct license earlier and you want to
               | fix the situation.
               | 
               | There is 0% chance Timescale would sue mom'n'pop
               | operation for breaking their license.
        
               | wasmitnetzen wrote:
               | The license doesn't allow you to "give access to,
               | directly or indirectly (e.g., via a wrapper) to [SQL]".
               | 
               | Legally, what's a wrapper? Is a REST API a wrapper?
        
               | belk wrote:
               | I imagine legally would need a lawsuit to set a
               | precedence, and if a license owner sets an over-reaching
               | precedence of what a wrapper is, they risk losing
               | customer trust and companies avoiding them like the
               | plague.
               | 
               | e.g. timescaledb going after a tsdb as a service company
               | offering tsdb behind a graphql wrapper vs timescaledb
               | going after a financial company offering timeseries data
               | collection and viewing.
               | 
               | I think a good border test would be, would timescaledb
               | allow you to offer a metrics and logging service?
               | technically you're offering timeseries database
               | functionality, but it's in a constrained domain, and very
               | clearly a different product, but still effectively
               | CRUDing timeseries data.
        
             | miohtama wrote:
             | If you have 100 million points per day it's likely you
             | afford to pay any commercial license.
        
           | vjerancrnjak wrote:
           | I think you're not talking about the same thing. There's an
           | expression related to time series data --- "high churn" and
           | another "active time series".
           | 
           | 500 million active time series is extremely huge.
           | 
           | It does not have anything to do with number of data points.
           | 
           | Good time series databases can scale to 1M-10M writes per
           | second without a hiccup.
        
             | osigurdson wrote:
             | I suppose it means by what is meant by an "observation". Is
             | that an entire time series for a single property or a
             | single point? Nevertheless, the number of points absolutely
             | matters.
             | 
             | A regular Postgres database can give you 50-100K inserts
             | per second and can scale to at least 1B rows with 100K+
             | individual series without much difficultly. If you know you
             | will need less (or much less) than this, my suggestion is
             | to use a regular table with a covering index. If you need
             | more, use ClickHouse.
        
         | rapsey wrote:
         | Databases are a tough business. You're just waiting for open
         | source to eat your lunch.
        
       | mathfailure wrote:
       | Thank you for posting it: I followed the links and found out
       | about trunk and https://pgt.dev/
        
       | ramoneguru wrote:
       | How does this stack up against something like what QuestDB
       | offers?
        
       | plainOldText wrote:
       | Your site is very well designed and easy to read btw, and the app
       | UI looks great from the demo photos. I might try it!
        
         | samaysharma wrote:
         | Thank you!
        
       | PeterZaitsev wrote:
       | Great to see this kind of innovation. PostgreSQL is interesting
       | while "core" was always Open Source and using very permissive
       | Open Source library, there have been many proprietary and source
       | available extensions, ranging from replication to time series
       | support.
       | 
       | Now we see those Proprietary extensions being disrupted by proper
       | Open Source!
        
       | nitinreddy88 wrote:
       | Most of the time-series queries (almost all of them) are
       | aggregated queries. Why not leverage or build top-notch
       | Columnarstore for the same.
       | 
       | Everything seems to be there and why there's not first class
       | product like ClickHouse on PG.
        
         | paulryanrogers wrote:
         | Citus, Persona, TimescaleDB?
        
           | nitinreddy88 wrote:
           | Looking at the comparison with Click Benchmark, they are
           | almost pathetic in terms of performance. They cant even
           | handle sub-second aggregation queries for 10M records.
           | Compared that too even duckdb reading from parquet files.
        
             | nikita wrote:
             | Postgres is missing a proper columnstore implementation.
             | It's a big gap and it's not easy to build.
             | 
             | One solution could be integrating duckdb in a similar way
             | as pgvector. You need to map duckdb storage to Postgres
             | storage and reuse duckdb query processor. I believe it's
             | the fastest way to get Postgres to have competitive
             | columnstores.
        
               | tarasglek wrote:
               | This sounds interesting. I don't see duck db as a
               | supported extension or mentioned anywhere in your code
               | yet ;)
               | 
               | Is this foreshadowing?
        
               | anentropic wrote:
               | Hydra?
        
           | bloopernova wrote:
           | That was very "Klaatu, Barada, Nikto".
        
           | applied_heat wrote:
           | Victoria metrics as well, they say based on similar
           | structures used in clickhouse
        
         | netik wrote:
         | The gold standard for this Druid at very large scale, or
         | ClickhouseDB. Clickhouse has a lot of problems as far as
         | modifying/scaling shards after the fact, while Druid handles
         | this with ease (and the penalty of not being able to update
         | after the fact.)
        
           | anentropic wrote:
           | Doris?
        
       | MuffinFlavored wrote:
       | Dumb question: why can't I just insert a bunch of rows with a
       | timestamp column and indices? Where does that fall short? At a
       | certain # of rows or something?
       | 
       | What does this let me do that can't be achieved with "regular
       | PostgreSQL without the extension"?
        
         | gonzo41 wrote:
         | Time based partitioning.
        
           | MuffinFlavored wrote:
           | CREATE TABLE logs (             id SERIAL PRIMARY KEY,
           | log_time TIMESTAMP NOT NULL,             message TEXT
           | ) PARTITION BY RANGE (log_time);
           | 
           | Why won't this work on stock PostgreSQL?
        
             | gonzo41 wrote:
             | read the docs, it's not saying that won't work. This
             | extension along with timescale just makes some things more
             | ergonomic.
        
               | darkstar_16 wrote:
               | And provides helper functions to deal with regular time
               | series tasks
        
             | jmaker wrote:
             | That won't work already because your timestamp isn't part
             | of your primary key.
        
         | skibbityboop wrote:
         | I'm with you, I need to read up more on where timeseries could
         | benefit, at work we have a PostgreSQL instance with around 27
         | billion rows in a single partitioned table, partitioned by
         | week. Goes back to January of 2017 and just contains tons of
         | data coming in from sensors. It's not "fast", but also not
         | ridiculously slow to say e.g. "Give me everything for sensor
         | 29380 in March of 2019".
         | 
         | I guess depends on your needs but I do think I need to
         | investigate timeseries more to see if it'd help us.
        
         | citizen_friend wrote:
         | It will work just fine.
        
       | vivzkestrel wrote:
       | Benchmarks with respect to QuestDB, TimescaleDB?
        
       | matthewmueller wrote:
       | Would love to use this with RDS!
        
         | anentropic wrote:
         | Same here, but not holding my breath since all these neat
         | Postgres extensions compete with other AWS DBs like Redshift,
         | Timestream etc
        
       | suyash wrote:
       | Interesting, how does it compare to proper (open source) time
       | series database like InfluxDB other than being 'Postgres' like ?
        
         | tucnak wrote:
         | i's not Postgres-like, it _is_ Postgres
        
         | RedShift1 wrote:
         | InfluxDB is a "proper" time series database?
        
       | wdb wrote:
       | Would this be a good extension when you want to load balancer log
       | entries (status, response body, headers etc)?
       | 
       | I think a columnar database store would be more efficient than
       | normal row-based databases? load balancer log entries could be
       | considered something similar to analytics events.
        
       | nhourcard wrote:
       | Interesting release, it feels that the time-series database
       | landscape is evolving toward:
       | 
       | a) columnar store & built from scratch, with convergence toward
       | open formats such as parquet & arrow: influxdb 3.0, questdb
       | 
       | b) Adding time-series capabilities on top of Postgres: timescale,
       | pg_timeseries
       | 
       | c) platforms focused on observability around the Prometheus
       | ecosystem: grafana, victoria metrics, chronosphere
        
       | mnahkies wrote:
       | Looking at their roadmap, the killer feature for me would be
       | incremental materialised views
       | 
       | > Incremental view maintenance -- define views which stay up-to-
       | date with incoming data without the performance hit of a REFRESH
       | 
       | I wonder if they plan to incorporate something like
       | https://github.com/sraoss/pg_ivm or write their own
       | implementation.
       | 
       | (Although I'm hopeful that one day we see ivm land in postgres
       | core)
        
       | valenterry wrote:
       | It's about time that postgres (and other databases) add native
       | append-only tables. That doesn't make it timeseries, but it
       | probably helps with the standardiziation and all the logic/access
       | around it.
        
       ___________________________________________________________________
       (page generated 2024-05-21 12:00 UTC)