[HN Gopher] Grafana Mimir and VictoriaMetrics: performance tests
       ___________________________________________________________________
        
       Grafana Mimir and VictoriaMetrics: performance tests
        
       Author : nikolay_sivko
       Score  : 48 points
       Date   : 2022-09-09 14:27 UTC (8 hours ago)
        
 (HTM) web link (victoriametrics.com)
 (TXT) w3m dump (victoriametrics.com)
        
       | valyala wrote:
       | VictoriaMetrics core developer here. Benchmarks are non-trivial
       | to run properly. Especially when conducting a benchmark against
       | one of the competitor. We tried hard to create a benchmark based
       | on production-like data. It collects data from the most
       | frequently monitored scrape target in Prometheus ecosystem -
       | node_exporter. At the same time the benchmark runs queries based
       | on real-world alerting rules for the metrics collected from
       | node_exporter. Resource usage (cpu, ram, disk io, disk space
       | usage) is recorded during the benchmark, so later these metrics
       | can be analyzed and compared. The benchmark runs for 24 hours, so
       | it catches all the transitional states during this duration such
       | as periodic data compactions. We hope the benchmark feamework
       | will be re-used by others for comparing the performance and
       | resource usage for Prometheus-like monitoring systems such as
       | Cortex, M3DB, Thanos, Promscale, etc.
        
       | lmickh wrote:
       | Interesting test, but I find some of these benchmarks kind of
       | miss the point. Even Grafana's.
       | 
       | The appeal of Thanos/Cortex/Mimir is the long term object
       | storage. The value isn't that it is simpler or cheaper to run.
       | The value is that I can compare data to months if not years ago.
       | It can cost much more than the price of the instances to store
       | good metrics over time even when the data is rolled up.
       | 
       | Scaling the read/write path separately has a lot of benefits as
       | well too, but I would guess that doesn't come up often for most
       | folks.
       | 
       | How much telemetry you can get in/out of your system over a day
       | is important, but how much you can get in/out of it over years is
       | overlooked.
        
         | valyala wrote:
         | Both VictoriaMetrics and Grafana Mimir perfectly fit for long-
         | term storage for Prometheus data. The difference is in the used
         | data storage types - VictoriaMetrics stores data to persistent
         | disks (aka block storage), while Grafana Mimir stores data to
         | S3-like object storage. Both storage types - block storage and
         | object storage - can be used for long-term storage. They have
         | the following differences in the context of major cloud
         | providers (AWS, GCP, Azure):
         | 
         | - Object storage space usually costs 2x-8x less than block
         | storage space.
         | 
         | - Object storage has up to 100x highest latency for data access
         | than block storage (hundreds of milliseconds for object storage
         | vs milliseconds for block storage).
         | 
         | - Block storage usually has much lower network-related error
         | rate comparing to object storagr. For example, it is quite
         | common practice to retry reading data from object storage on
         | network errors, while block storage-based filesystems are much
         | more reliable for this aspect in major cloud providers.
         | 
         | - Cloud providers tend to charge every read operation for
         | object storage, while reading from block storage is free. This
         | point is usually overlooked when estimating costs for block
         | storage vs object storage.
         | 
         | Given these differences, block storage usually provides better
         | performance than object storage. Block storage also can cost
         | less than object storage when the stored data is read
         | frequently.
         | 
         | VictoriaMetrics is optimized for HDD-based block storage, so
         | there is no need to use more expensive SSD-based block storage
         | in most cases. Additionally, VictoriaMetrics compresses
         | production metrics 2x-10x better than Prometheus-like
         | solutions, which store data to object storage (Thanos, Cortex,
         | Grafana Mimir). This also reduces long-term storage costs.
         | 
         | On top of this, enterprise version of VictoriaMetrics can be
         | configured to downsample historical data, so it will take less
         | disk space [1].
         | 
         | [1] https://docs.victoriametrics.com/#downsampling
        
           | PeterZaitsev wrote:
           | To be fair, the benefits of "object store" it is scalable and
           | bottomless while you have to play with EBS volume expansion
           | etc. Some folks find managing fleet of EBS volumes not a big
           | deal others find it problematic.
           | 
           | I think having "long term storage" on S3 compatible location
           | is a way to go but you need ability to use local storage as
           | cache to queries on recent data or just date range you're
           | working with can be fast.
        
             | valyala wrote:
             | Agreed with this. That's why we at VictoriaMetrics are
             | investigating a hybrid storage scheme - to store recently
             | added data at block storage, while gradually moving older
             | data from block storage to object storage in background. On
             | the query side, the requested data should be transparently
             | queried from both object storage and block storage.
        
         | hagen1778 wrote:
         | > The value is that I can compare data to months if not years
         | ago.
         | 
         | I'd say it is quite specific case. The most important data is
         | recent data. Monitoring system should help you to identify
         | current issues. Be reliable and performant, so you don't spend
         | minutes waiting for response while your production is on fire.
         | 
         | Second in importance is data for last N days. The period when
         | you analyze recent changes (updates, releases) or incidents.
         | You want this data to be easy to get and pivot, changing
         | queries ad-hoc and get results immediately. So root cause
         | analysis won't take days of work.
         | 
         | Data older than month is rarely accessed. It is usually used
         | for capacity planning, retrospective analysis - things which
         | you do once in 3 months, or even once a year. Here, you can
         | afford long, slow queries.
         | 
         | Both, VictoriaMetrics and Mimir, do a lot to provide fast
         | access to the recent data: to get it stored and to get it ready
         | for queries.
        
         | BurritoAlPastor wrote:
         | In my experience, the business use of retaining years-old
         | metrics is _overvalued_ , because changes in the system tend to
         | prevent any kind of long-term apples-to-apples comparison.
         | You'll have changed your metrics engine, or your collector, or
         | your tagging strategy, or your hosting strategy, or your
         | containerization, or your deployments, or etc. etc. Even if you
         | can _find_ the like metrics from last year, you can't trust
         | that they meant the same thing then that they do now.
        
           | liketochill wrote:
           | In industrial settings where the process does not change but
           | equipment is overhauled it can be quite valuable To compare
           | year over year
        
       | pachico wrote:
       | I've been using VictoriaMetrics for years already and I must say
       | it's simply perfect.
       | 
       | It's not only about its performance but the entire ecosystem and
       | the absolutely zero friction upgrade experience.
       | 
       | It's so, so good.
       | 
       | Happy to share notes if interested.
        
         | pachico wrote:
         | I just wanted to add my favourite feature which is how easy it
         | is to have high availability.
         | 
         | I run all my Prometheus in pairs, each sending all metrics to
         | two VM on-prem (Hetzner) servers. Here's the magic, each VM
         | server will dedup metrics (remember, they are all coming from
         | two Prom instances). This way you have a simple and reliable HA
         | long retention storage.
         | 
         | I only deal with 500k active series but I'd bet I can deal with
         | 10-20x more with the same setup.
        
           | dedupnovice wrote:
           | Can you explain more about the dedup feature? Does it require
           | some kind of unique id from the source? Is it done on ingest
           | or on query? Is that something included with VictoriaMetrics
           | or a separate addon?
        
             | pachico wrote:
             | It's a VM built-in feature at ingestion time. In case two
             | independent Prometheus instances send the same metric
             | within the same time interval, only the first one will be
             | stored.
             | 
             | https://docs.victoriametrics.com/Single-server-
             | VictoriaMetri...
        
       | jmakov wrote:
       | How does it compare to Clickhouse?
        
         | valyala wrote:
         | VictoriaMetrics is based on ClickHouse ideas, but is
         | specifically optimized for storing and querying floating-point
         | time series with arbitrary sets of (key=value) tags. Such time
         | series are also known as metrics or measurements. See [1], [2]
         | and [3] for more details.
         | 
         | [1] https://medium.com/@valyala/how-victoriametrics-makes-
         | instan...
         | 
         | [2] https://www.youtube.com/watch?v=p9qjb_yoBro
         | 
         | [3] https://faun.pub/victoriametrics-creating-the-best-remote-
         | st...
        
       ___________________________________________________________________
       (page generated 2022-09-09 23:01 UTC)