[HN Gopher] The Rise of Open Source Time Series Databases
       ___________________________________________________________________
        
       The Rise of Open Source Time Series Databases
        
       Author : valyala
       Score  : 56 points
       Date   : 2024-09-14 12:08 UTC (10 hours ago)
        
 (HTM) web link (victoriametrics.com)
 (TXT) w3m dump (victoriametrics.com)
        
       | baggiponte wrote:
       | Interesting! Too bad it's just about two of them.
        
       | gmuslera wrote:
       | A bit opinionated and partial version of history (graphite was
       | big before Influxdb and Prometheus came out, and the landscape is
       | bigger than what they mentioned) but it might be a good enough
       | starting point to learn about this.
        
       | axytol wrote:
       | I'm using VictoriaMetrics (VM) to store basic weather data, like
       | temperature and humidity. My initial setup was based on
       | Prometheus however it seemed very hard to set a high data
       | retention value, default was something like 15 days if I recall
       | correctly.
       | 
       | Since I would actually like to store all recorded values
       | permanently, I could partially achieve this with VM which let me
       | set a higher threshold, like 100 years. Still not 'forever' as I
       | would have liked, but I guess me and my flimsy weather data setup
       | will have other things than the retention threshold to worry
       | about in 100 years.
       | 
       | Would be nice to learn the reason why an infinite threshold is
       | not allowed.
        
         | starkparker wrote:
         | Usually performance and storage concerns. You can set
         | effectively infinite retention on Prometheus, but after a long
         | enough period you're going to like querying it even less.
         | 
         | Most TSDBs aren't built for use cases like "query hourly/daily
         | data over many years". Many use cases aren't looking further
         | than 30 days because they're focused on things like app or
         | device uptime and performance metrics, and if they are running
         | on longer time frames they're recording (or keeping, or
         | consolidating data to) far fewer data points to keep
         | performance usable.
        
         | atombender wrote:
         | Mimir [1] is what we use where I work. We are very happy with
         | it, and we have very long retention. Previously, our Prometheus
         | setup was extremely slow if you went past today, but Mimir
         | partitions the data to make it extremely fast to query even
         | long time periods. We also used Thanos for a while, but Mimir
         | apparently worked better.
         | 
         | [1] https://grafana.com/oss/mimir/
        
           | PeterZaitsev wrote:
           | Yeah. Would be interested to see how VictoriaMetrics compares
           | to Mimir, not just Prometheus.
           | 
           | To be fair many projects in Prometheus "long term store"
           | space come and gone - Thanos, Cortex, M3
        
       | starkparker wrote:
       | This is unsurprisingly a VictoriaMetrics post that frames the
       | history in a narrow way to talk favorably about VictoriaMetrics.
        
         | to11mtm wrote:
         | Could be worse.
         | 
         | I remember when a bunch of WSO2 contribs broke off, made their
         | own (AGPL) ESB, and created a 'ESB Performance' website that
         | cherry picked specific scenarios to make WSO2 look worse than
         | their product.
         | 
         | This was made funnier because the 'consultant' (really a buddy
         | of management that used us as an experiment for all sorts of
         | things, at least the code was readable, learned lots of
         | languages fixing things lol) wound up having us go with the
         | AGPL product, and it wasn't until I said to the team 'hey, lets
         | try using the WSO2 docs as a cross reference' before we got
         | productive.
         | 
         | I do miss the 'dramatic readings' of UltraESB docs, learned a
         | colleague understood Pashto [0].
         | 
         | [0] - After a specific 'verse', the colleague pointed out one
         | of the odd phrases in the docs that prompted this practice, was
         | a colloquialism in that language.
        
       | thebeardisred wrote:
       | The author calls out InfluxDB as the first time series database.
       | 
       | Why would that not have been rrdtool or something earlier?
        
       | gillh wrote:
       | This topic has been done to death. Too many OSS options in the
       | last ~10 years with very little differentiation.
       | 
       | Let's talk LLMs instead.
        
       | jnordwick wrote:
       | If you are writing an article about time series databases, and
       | you don't mention KDB - straight to jail. It is the grandfather
       | of time series and predates influx by about a decade. It is still
       | the fastest out there too. It is used by about every major
       | financial and trading institution in the US and Europe.
       | 
       | Everybody thinks TSDB are something new-ish, but they've been
       | around since the days of APL. All you youngins disappointment me
       | every time you write about time series, vector languages, or
       | data-oriented programming and entirely neglect all the work that
       | comes under the APL/Vector umbrella. SOA and DOD have been around
       | for 50+ years, and they didn't start with c++ or Pandas.
       | 
       | Now the creator, Arthur Whitney, has a new one out called Shakti
       | that is even faster (but has also ditched from the "niceties" of
       | Q.
       | 
       | https://shakti.com/
        
         | kingforaday wrote:
         | kdb is mentioned. Possible it was added after your comment in
         | the last 30 mins, but the article is about open source not
         | proprietary.
        
           | jnordwick wrote:
           | I control-F'ed and didn't find it before I wrote that
           | comment. I searched for KDB and APL, but found nothing.
           | 
           | EDIT: yeah that sentence is new. I think some of the
           | sentences around Influx being the "first" have been adjusted
           | too now saying it is the first "mainstream" and called other
           | db's niche. it's also a little off. KDB+ is more popular in
           | large banks and trading firms than HFT places.
        
         | d0mine wrote:
         | kdb is not open-source, is it?
         | https://news.ycombinator.com/item?id=19973847
        
       | jmakov wrote:
       | Why would you use this instead of CH if your usecase is metrics?
        
         | dan-robertson wrote:
         | More ergonomic queries, though less flexible than sql
        
         | applied_heat wrote:
         | Vm is really straightforward to get set up, I've heard
         | clickhouse admin can be more involved
        
       | Mongoose wrote:
       | Had me until claiming that InfluxDB was the first mainstream TSDB
       | _in 2013_. OpenTSDB (2010)? Graphite (2008)? RRDtool (1999)?
       | 
       | Maybe Influx took off in a way these prior projects didn't, but
       | people have been storing time series data for decades.
        
         | PeterZaitsev wrote:
         | I think While OpenTSDB was reasonably general purpose, Graphite
         | and RRDTools were done for very specific monitoring use cases.
        
       | suyash wrote:
       | InfluxDB is open source (https://github.com/influxdata) and seems
       | to be leader by far as per DBEngines ranking : https://db-
       | engines.com/en/ranking/time+series+dbms
        
         | PeterZaitsev wrote:
         | Has things changed with InfluxDB 3.0 ?
         | 
         | Previously InfluxDB was Open Core, with very crippled Community
         | version (ie High Availability was Enterprise only)
        
         | 00xnull wrote:
         | AWS interestingly launched a managed InfluxDB option under the
         | "Timestream" product this year.
         | 
         | I've personally found TimescaleDB to be a much easier to work
         | with platform, even if there is more overhead in defining a
         | schema and enabling compression, etc.
        
       | dan-robertson wrote:
       | I wonder if the in-house metrics systems at big tech firms like
       | Google and Facebook should be counted as 'proprietary' for these
       | purposes. I suppose not because one can't really pay to get them
       | internally.
        
       | mmooss wrote:
       | I might expect that for a specialized task, specialized database
       | technology would outperform general relational databases. But
       | what specific technologies have significant impacts on time
       | series db efficiency?
        
       | andreineculau wrote:
       | How do these stack up to https://www.timescale.com/ ?
        
         | to11mtm wrote:
         | I haven't actually used Timescale but it was on my list to try
         | after various gripes with how InfluxDB worked and limitations
         | on tags.
         | 
         | Broadly speaking I can say I'd trust TSDB over Influx for a lot
         | of cases since it's using Postgres under the hood, and while I
         | don't know precisely what issues we had adopting Influx, I know
         | we had some as we 'scaled up' tho and that was when stuff like
         | tag limits were brought up a lot (Interestingly, our app was
         | 'barely passing' because we metric'd a lot but we also kept
         | stuff just short enough for everyone to be OK with it, mostly
         | because the metrics we got were very useful for things)
        
       | camel_gopher wrote:
       | Wow this article misses many TSDBs that predated those listed.
       | Victoria seems to be basically a rewrite of Prom.
        
       ___________________________________________________________________
       (page generated 2024-09-14 23:00 UTC)