[HN Gopher] pg_timeseries: Open-source time-series extension for...
___________________________________________________________________
pg_timeseries: Open-source time-series extension for PostgreSQL
Author : samaysharma
Score : 270 points
Date : 2024-05-20 16:44 UTC (19 hours ago)
(HTM) web link (tembo.io)
(TXT) w3m dump (tembo.io)
| vantiro wrote:
| PostgreSQL licensed, good move!
| gxyt6gfy5t wrote:
| How's it different than timescaledb?
| logrot wrote:
| > You may already be asking: "why not just power the stack
| using TimescaleDB?" The Timescale License would restrict our
| use of features such as compression, incremental materialized
| views, and bottomless storage. With these missing, we felt that
| what remained would not provide an adequate basis for our
| customers' time-series needs. Therefore, we decided to build
| our own PostgreSQL-licensed extension.
| samaysharma wrote:
| It's much newer for one, so it's behind on features. But,
| we're working on adding new ones based on customer demand.
|
| We want to build it with a PostgreSQL license using existing
| community extensions as much as possible and build custom
| stuff (still permissively licensed) only when necessary.
| vantiro wrote:
| Timescaledb's license is more like Redis' new license?
| hosh wrote:
| The use of postgresql licensing might mean we can see this
| available for AWS RDS and other managed PostgreSQL providers.
| riedel wrote:
| >You may already be asking: "why not just power the stack using
| TimescaleDB?" The Timescale License would restrict our use of
| features such as compression, incremental materialized views, and
| bottomless storage. With these missing, we felt that what
| remained would not provide an adequate basis for our customers'
| time-series needs. Therefore, we decided to build our own
| PostgreSQL-licensed extension.
|
| Have been using the free version timescaledb before to shard a
| 500 Million observation time series database. Worked drop-in
| without much hassle. Would have expected some benchmarks and
| comparisons in the post. I will for sure watch this...
| osigurdson wrote:
| 500 million is very little however. A regular table with a
| covering index would probably be fine for many use cases with
| this number of points.
| vegabook wrote:
| indeed. Financial timeseries I was working with over 100
| million new points, _per day_. For anything serious
| TimescaleDB is essentially not open source. Well done
| tembo.io crew -- will definitely give this a whirl.
| eyegor wrote:
| What do you mean by "for anything serious it isn't open
| source"? I didn't see any red flags in the apache variant
| of timescale, just constant pleading to try their hosted
| option.
|
| https://github.com/timescale/timescaledb/blob/main/LICENSE-
| A...
| pmeira wrote:
| Compression and other features use the non-Apache
| license:
|
| https://github.com/timescale/timescaledb/tree/main/tsl
| NineStarPoint wrote:
| And as I understand that license, you are allowed to use
| Timescale for anything that doesn't involve offering
| Timescale itself as a service. If you were using
| Timescale to process lots of time series transactions in
| your backend, it doesn't seem to me like that would break
| the license.
|
| (Which is to say that if, like Tembo, you're offering
| Postgres as a service you do indeed have a problem. But
| for other use, should be fine)
| dig1 wrote:
| The tricky thing with these licenses (BSL, SSPL, etc.) is
| that you can use them freely for internal stuff, but
| suddenly, if you make your product public (assuming it
| uses, e.g., TimescaleDB), things can get muddy. Everyone
| wants the flexibility to either open-source or
| commercialize a successful internal product in the
| future.
|
| The problem is that, even if your app is not a mere
| frontend for TimescaleDB/Mongo/Redis, you can get sued,
| and you'll have to spend unnecessary time and money
| proving things in court. No one wants this, especially a
| startup owner whose money and time are tight. Also, even
| if your startup/company uses some of these techs,
| potential company buyers will be very wary of the
| purchase if they know they'll have to deal with this
| later.
| miohtama wrote:
| I would assume TimescaleDb only sues if you money. In
| this case you can also afford a commercial license. If
| you hit big just contact them and tell there was a
| problem having a correct license earlier and you want to
| fix the situation.
|
| There is 0% chance Timescale would sue mom'n'pop
| operation for breaking their license.
| wasmitnetzen wrote:
| The license doesn't allow you to "give access to,
| directly or indirectly (e.g., via a wrapper) to [SQL]".
|
| Legally, what's a wrapper? Is a REST API a wrapper?
| belk wrote:
| I imagine legally would need a lawsuit to set a
| precedence, and if a license owner sets an over-reaching
| precedence of what a wrapper is, they risk losing
| customer trust and companies avoiding them like the
| plague.
|
| e.g. timescaledb going after a tsdb as a service company
| offering tsdb behind a graphql wrapper vs timescaledb
| going after a financial company offering timeseries data
| collection and viewing.
|
| I think a good border test would be, would timescaledb
| allow you to offer a metrics and logging service?
| technically you're offering timeseries database
| functionality, but it's in a constrained domain, and very
| clearly a different product, but still effectively
| CRUDing timeseries data.
| miohtama wrote:
| If you have 100 million points per day it's likely you
| afford to pay any commercial license.
| vjerancrnjak wrote:
| I think you're not talking about the same thing. There's an
| expression related to time series data --- "high churn" and
| another "active time series".
|
| 500 million active time series is extremely huge.
|
| It does not have anything to do with number of data points.
|
| Good time series databases can scale to 1M-10M writes per
| second without a hiccup.
| osigurdson wrote:
| I suppose it means by what is meant by an "observation". Is
| that an entire time series for a single property or a
| single point? Nevertheless, the number of points absolutely
| matters.
|
| A regular Postgres database can give you 50-100K inserts
| per second and can scale to at least 1B rows with 100K+
| individual series without much difficultly. If you know you
| will need less (or much less) than this, my suggestion is
| to use a regular table with a covering index. If you need
| more, use ClickHouse.
| rapsey wrote:
| Databases are a tough business. You're just waiting for open
| source to eat your lunch.
| mathfailure wrote:
| Thank you for posting it: I followed the links and found out
| about trunk and https://pgt.dev/
| ramoneguru wrote:
| How does this stack up against something like what QuestDB
| offers?
| plainOldText wrote:
| Your site is very well designed and easy to read btw, and the app
| UI looks great from the demo photos. I might try it!
| samaysharma wrote:
| Thank you!
| PeterZaitsev wrote:
| Great to see this kind of innovation. PostgreSQL is interesting
| while "core" was always Open Source and using very permissive
| Open Source library, there have been many proprietary and source
| available extensions, ranging from replication to time series
| support.
|
| Now we see those Proprietary extensions being disrupted by proper
| Open Source!
| nitinreddy88 wrote:
| Most of the time-series queries (almost all of them) are
| aggregated queries. Why not leverage or build top-notch
| Columnarstore for the same.
|
| Everything seems to be there and why there's not first class
| product like ClickHouse on PG.
| paulryanrogers wrote:
| Citus, Persona, TimescaleDB?
| nitinreddy88 wrote:
| Looking at the comparison with Click Benchmark, they are
| almost pathetic in terms of performance. They cant even
| handle sub-second aggregation queries for 10M records.
| Compared that too even duckdb reading from parquet files.
| nikita wrote:
| Postgres is missing a proper columnstore implementation.
| It's a big gap and it's not easy to build.
|
| One solution could be integrating duckdb in a similar way
| as pgvector. You need to map duckdb storage to Postgres
| storage and reuse duckdb query processor. I believe it's
| the fastest way to get Postgres to have competitive
| columnstores.
| tarasglek wrote:
| This sounds interesting. I don't see duck db as a
| supported extension or mentioned anywhere in your code
| yet ;)
|
| Is this foreshadowing?
| anentropic wrote:
| Hydra?
| bloopernova wrote:
| That was very "Klaatu, Barada, Nikto".
| applied_heat wrote:
| Victoria metrics as well, they say based on similar
| structures used in clickhouse
| netik wrote:
| The gold standard for this Druid at very large scale, or
| ClickhouseDB. Clickhouse has a lot of problems as far as
| modifying/scaling shards after the fact, while Druid handles
| this with ease (and the penalty of not being able to update
| after the fact.)
| anentropic wrote:
| Doris?
| MuffinFlavored wrote:
| Dumb question: why can't I just insert a bunch of rows with a
| timestamp column and indices? Where does that fall short? At a
| certain # of rows or something?
|
| What does this let me do that can't be achieved with "regular
| PostgreSQL without the extension"?
| gonzo41 wrote:
| Time based partitioning.
| MuffinFlavored wrote:
| CREATE TABLE logs ( id SERIAL PRIMARY KEY,
| log_time TIMESTAMP NOT NULL, message TEXT
| ) PARTITION BY RANGE (log_time);
|
| Why won't this work on stock PostgreSQL?
| gonzo41 wrote:
| read the docs, it's not saying that won't work. This
| extension along with timescale just makes some things more
| ergonomic.
| darkstar_16 wrote:
| And provides helper functions to deal with regular time
| series tasks
| jmaker wrote:
| That won't work already because your timestamp isn't part
| of your primary key.
| skibbityboop wrote:
| I'm with you, I need to read up more on where timeseries could
| benefit, at work we have a PostgreSQL instance with around 27
| billion rows in a single partitioned table, partitioned by
| week. Goes back to January of 2017 and just contains tons of
| data coming in from sensors. It's not "fast", but also not
| ridiculously slow to say e.g. "Give me everything for sensor
| 29380 in March of 2019".
|
| I guess depends on your needs but I do think I need to
| investigate timeseries more to see if it'd help us.
| citizen_friend wrote:
| It will work just fine.
| vivzkestrel wrote:
| Benchmarks with respect to QuestDB, TimescaleDB?
| matthewmueller wrote:
| Would love to use this with RDS!
| anentropic wrote:
| Same here, but not holding my breath since all these neat
| Postgres extensions compete with other AWS DBs like Redshift,
| Timestream etc
| suyash wrote:
| Interesting, how does it compare to proper (open source) time
| series database like InfluxDB other than being 'Postgres' like ?
| tucnak wrote:
| i's not Postgres-like, it _is_ Postgres
| RedShift1 wrote:
| InfluxDB is a "proper" time series database?
| wdb wrote:
| Would this be a good extension when you want to load balancer log
| entries (status, response body, headers etc)?
|
| I think a columnar database store would be more efficient than
| normal row-based databases? load balancer log entries could be
| considered something similar to analytics events.
| nhourcard wrote:
| Interesting release, it feels that the time-series database
| landscape is evolving toward:
|
| a) columnar store & built from scratch, with convergence toward
| open formats such as parquet & arrow: influxdb 3.0, questdb
|
| b) Adding time-series capabilities on top of Postgres: timescale,
| pg_timeseries
|
| c) platforms focused on observability around the Prometheus
| ecosystem: grafana, victoria metrics, chronosphere
| mnahkies wrote:
| Looking at their roadmap, the killer feature for me would be
| incremental materialised views
|
| > Incremental view maintenance -- define views which stay up-to-
| date with incoming data without the performance hit of a REFRESH
|
| I wonder if they plan to incorporate something like
| https://github.com/sraoss/pg_ivm or write their own
| implementation.
|
| (Although I'm hopeful that one day we see ivm land in postgres
| core)
| valenterry wrote:
| It's about time that postgres (and other databases) add native
| append-only tables. That doesn't make it timeseries, but it
| probably helps with the standardiziation and all the logic/access
| around it.
___________________________________________________________________
(page generated 2024-05-21 12:00 UTC)