[HN Gopher] The Rise of Open Source Time Series Databases
___________________________________________________________________
The Rise of Open Source Time Series Databases
Author : valyala
Score : 56 points
Date : 2024-09-14 12:08 UTC (10 hours ago)
(HTM) web link (victoriametrics.com)
(TXT) w3m dump (victoriametrics.com)
| baggiponte wrote:
| Interesting! Too bad it's just about two of them.
| gmuslera wrote:
| A bit opinionated and partial version of history (graphite was
| big before Influxdb and Prometheus came out, and the landscape is
| bigger than what they mentioned) but it might be a good enough
| starting point to learn about this.
| axytol wrote:
| I'm using VictoriaMetrics (VM) to store basic weather data, like
| temperature and humidity. My initial setup was based on
| Prometheus however it seemed very hard to set a high data
| retention value, default was something like 15 days if I recall
| correctly.
|
| Since I would actually like to store all recorded values
| permanently, I could partially achieve this with VM which let me
| set a higher threshold, like 100 years. Still not 'forever' as I
| would have liked, but I guess me and my flimsy weather data setup
| will have other things than the retention threshold to worry
| about in 100 years.
|
| Would be nice to learn the reason why an infinite threshold is
| not allowed.
| starkparker wrote:
| Usually performance and storage concerns. You can set
| effectively infinite retention on Prometheus, but after a long
| enough period you're going to like querying it even less.
|
| Most TSDBs aren't built for use cases like "query hourly/daily
| data over many years". Many use cases aren't looking further
| than 30 days because they're focused on things like app or
| device uptime and performance metrics, and if they are running
| on longer time frames they're recording (or keeping, or
| consolidating data to) far fewer data points to keep
| performance usable.
| atombender wrote:
| Mimir [1] is what we use where I work. We are very happy with
| it, and we have very long retention. Previously, our Prometheus
| setup was extremely slow if you went past today, but Mimir
| partitions the data to make it extremely fast to query even
| long time periods. We also used Thanos for a while, but Mimir
| apparently worked better.
|
| [1] https://grafana.com/oss/mimir/
| PeterZaitsev wrote:
| Yeah. Would be interested to see how VictoriaMetrics compares
| to Mimir, not just Prometheus.
|
| To be fair many projects in Prometheus "long term store"
| space come and gone - Thanos, Cortex, M3
| starkparker wrote:
| This is unsurprisingly a VictoriaMetrics post that frames the
| history in a narrow way to talk favorably about VictoriaMetrics.
| to11mtm wrote:
| Could be worse.
|
| I remember when a bunch of WSO2 contribs broke off, made their
| own (AGPL) ESB, and created a 'ESB Performance' website that
| cherry picked specific scenarios to make WSO2 look worse than
| their product.
|
| This was made funnier because the 'consultant' (really a buddy
| of management that used us as an experiment for all sorts of
| things, at least the code was readable, learned lots of
| languages fixing things lol) wound up having us go with the
| AGPL product, and it wasn't until I said to the team 'hey, lets
| try using the WSO2 docs as a cross reference' before we got
| productive.
|
| I do miss the 'dramatic readings' of UltraESB docs, learned a
| colleague understood Pashto [0].
|
| [0] - After a specific 'verse', the colleague pointed out one
| of the odd phrases in the docs that prompted this practice, was
| a colloquialism in that language.
| thebeardisred wrote:
| The author calls out InfluxDB as the first time series database.
|
| Why would that not have been rrdtool or something earlier?
| gillh wrote:
| This topic has been done to death. Too many OSS options in the
| last ~10 years with very little differentiation.
|
| Let's talk LLMs instead.
| jnordwick wrote:
| If you are writing an article about time series databases, and
| you don't mention KDB - straight to jail. It is the grandfather
| of time series and predates influx by about a decade. It is still
| the fastest out there too. It is used by about every major
| financial and trading institution in the US and Europe.
|
| Everybody thinks TSDB are something new-ish, but they've been
| around since the days of APL. All you youngins disappointment me
| every time you write about time series, vector languages, or
| data-oriented programming and entirely neglect all the work that
| comes under the APL/Vector umbrella. SOA and DOD have been around
| for 50+ years, and they didn't start with c++ or Pandas.
|
| Now the creator, Arthur Whitney, has a new one out called Shakti
| that is even faster (but has also ditched from the "niceties" of
| Q.
|
| https://shakti.com/
| kingforaday wrote:
| kdb is mentioned. Possible it was added after your comment in
| the last 30 mins, but the article is about open source not
| proprietary.
| jnordwick wrote:
| I control-F'ed and didn't find it before I wrote that
| comment. I searched for KDB and APL, but found nothing.
|
| EDIT: yeah that sentence is new. I think some of the
| sentences around Influx being the "first" have been adjusted
| too now saying it is the first "mainstream" and called other
| db's niche. it's also a little off. KDB+ is more popular in
| large banks and trading firms than HFT places.
| d0mine wrote:
| kdb is not open-source, is it?
| https://news.ycombinator.com/item?id=19973847
| jmakov wrote:
| Why would you use this instead of CH if your usecase is metrics?
| dan-robertson wrote:
| More ergonomic queries, though less flexible than sql
| applied_heat wrote:
| Vm is really straightforward to get set up, I've heard
| clickhouse admin can be more involved
| Mongoose wrote:
| Had me until claiming that InfluxDB was the first mainstream TSDB
| _in 2013_. OpenTSDB (2010)? Graphite (2008)? RRDtool (1999)?
|
| Maybe Influx took off in a way these prior projects didn't, but
| people have been storing time series data for decades.
| PeterZaitsev wrote:
| I think While OpenTSDB was reasonably general purpose, Graphite
| and RRDTools were done for very specific monitoring use cases.
| suyash wrote:
| InfluxDB is open source (https://github.com/influxdata) and seems
| to be leader by far as per DBEngines ranking : https://db-
| engines.com/en/ranking/time+series+dbms
| PeterZaitsev wrote:
| Has things changed with InfluxDB 3.0 ?
|
| Previously InfluxDB was Open Core, with very crippled Community
| version (ie High Availability was Enterprise only)
| 00xnull wrote:
| AWS interestingly launched a managed InfluxDB option under the
| "Timestream" product this year.
|
| I've personally found TimescaleDB to be a much easier to work
| with platform, even if there is more overhead in defining a
| schema and enabling compression, etc.
| dan-robertson wrote:
| I wonder if the in-house metrics systems at big tech firms like
| Google and Facebook should be counted as 'proprietary' for these
| purposes. I suppose not because one can't really pay to get them
| internally.
| mmooss wrote:
| I might expect that for a specialized task, specialized database
| technology would outperform general relational databases. But
| what specific technologies have significant impacts on time
| series db efficiency?
| andreineculau wrote:
| How do these stack up to https://www.timescale.com/ ?
| to11mtm wrote:
| I haven't actually used Timescale but it was on my list to try
| after various gripes with how InfluxDB worked and limitations
| on tags.
|
| Broadly speaking I can say I'd trust TSDB over Influx for a lot
| of cases since it's using Postgres under the hood, and while I
| don't know precisely what issues we had adopting Influx, I know
| we had some as we 'scaled up' tho and that was when stuff like
| tag limits were brought up a lot (Interestingly, our app was
| 'barely passing' because we metric'd a lot but we also kept
| stuff just short enough for everyone to be OK with it, mostly
| because the metrics we got were very useful for things)
| camel_gopher wrote:
| Wow this article misses many TSDBs that predated those listed.
| Victoria seems to be basically a rewrite of Prom.
___________________________________________________________________
(page generated 2024-09-14 23:00 UTC)