[HN Gopher] Grafana Mimir and VictoriaMetrics: performance tests
___________________________________________________________________
Grafana Mimir and VictoriaMetrics: performance tests
Author : nikolay_sivko
Score : 48 points
Date : 2022-09-09 14:27 UTC (8 hours ago)
(HTM) web link (victoriametrics.com)
(TXT) w3m dump (victoriametrics.com)
| valyala wrote:
| VictoriaMetrics core developer here. Benchmarks are non-trivial
| to run properly. Especially when conducting a benchmark against
| one of the competitor. We tried hard to create a benchmark based
| on production-like data. It collects data from the most
| frequently monitored scrape target in Prometheus ecosystem -
| node_exporter. At the same time the benchmark runs queries based
| on real-world alerting rules for the metrics collected from
| node_exporter. Resource usage (cpu, ram, disk io, disk space
| usage) is recorded during the benchmark, so later these metrics
| can be analyzed and compared. The benchmark runs for 24 hours, so
| it catches all the transitional states during this duration such
| as periodic data compactions. We hope the benchmark feamework
| will be re-used by others for comparing the performance and
| resource usage for Prometheus-like monitoring systems such as
| Cortex, M3DB, Thanos, Promscale, etc.
| lmickh wrote:
| Interesting test, but I find some of these benchmarks kind of
| miss the point. Even Grafana's.
|
| The appeal of Thanos/Cortex/Mimir is the long term object
| storage. The value isn't that it is simpler or cheaper to run.
| The value is that I can compare data to months if not years ago.
| It can cost much more than the price of the instances to store
| good metrics over time even when the data is rolled up.
|
| Scaling the read/write path separately has a lot of benefits as
| well too, but I would guess that doesn't come up often for most
| folks.
|
| How much telemetry you can get in/out of your system over a day
| is important, but how much you can get in/out of it over years is
| overlooked.
| valyala wrote:
| Both VictoriaMetrics and Grafana Mimir perfectly fit for long-
| term storage for Prometheus data. The difference is in the used
| data storage types - VictoriaMetrics stores data to persistent
| disks (aka block storage), while Grafana Mimir stores data to
| S3-like object storage. Both storage types - block storage and
| object storage - can be used for long-term storage. They have
| the following differences in the context of major cloud
| providers (AWS, GCP, Azure):
|
| - Object storage space usually costs 2x-8x less than block
| storage space.
|
| - Object storage has up to 100x highest latency for data access
| than block storage (hundreds of milliseconds for object storage
| vs milliseconds for block storage).
|
| - Block storage usually has much lower network-related error
| rate comparing to object storagr. For example, it is quite
| common practice to retry reading data from object storage on
| network errors, while block storage-based filesystems are much
| more reliable for this aspect in major cloud providers.
|
| - Cloud providers tend to charge every read operation for
| object storage, while reading from block storage is free. This
| point is usually overlooked when estimating costs for block
| storage vs object storage.
|
| Given these differences, block storage usually provides better
| performance than object storage. Block storage also can cost
| less than object storage when the stored data is read
| frequently.
|
| VictoriaMetrics is optimized for HDD-based block storage, so
| there is no need to use more expensive SSD-based block storage
| in most cases. Additionally, VictoriaMetrics compresses
| production metrics 2x-10x better than Prometheus-like
| solutions, which store data to object storage (Thanos, Cortex,
| Grafana Mimir). This also reduces long-term storage costs.
|
| On top of this, enterprise version of VictoriaMetrics can be
| configured to downsample historical data, so it will take less
| disk space [1].
|
| [1] https://docs.victoriametrics.com/#downsampling
| PeterZaitsev wrote:
| To be fair, the benefits of "object store" it is scalable and
| bottomless while you have to play with EBS volume expansion
| etc. Some folks find managing fleet of EBS volumes not a big
| deal others find it problematic.
|
| I think having "long term storage" on S3 compatible location
| is a way to go but you need ability to use local storage as
| cache to queries on recent data or just date range you're
| working with can be fast.
| valyala wrote:
| Agreed with this. That's why we at VictoriaMetrics are
| investigating a hybrid storage scheme - to store recently
| added data at block storage, while gradually moving older
| data from block storage to object storage in background. On
| the query side, the requested data should be transparently
| queried from both object storage and block storage.
| hagen1778 wrote:
| > The value is that I can compare data to months if not years
| ago.
|
| I'd say it is quite specific case. The most important data is
| recent data. Monitoring system should help you to identify
| current issues. Be reliable and performant, so you don't spend
| minutes waiting for response while your production is on fire.
|
| Second in importance is data for last N days. The period when
| you analyze recent changes (updates, releases) or incidents.
| You want this data to be easy to get and pivot, changing
| queries ad-hoc and get results immediately. So root cause
| analysis won't take days of work.
|
| Data older than month is rarely accessed. It is usually used
| for capacity planning, retrospective analysis - things which
| you do once in 3 months, or even once a year. Here, you can
| afford long, slow queries.
|
| Both, VictoriaMetrics and Mimir, do a lot to provide fast
| access to the recent data: to get it stored and to get it ready
| for queries.
| BurritoAlPastor wrote:
| In my experience, the business use of retaining years-old
| metrics is _overvalued_ , because changes in the system tend to
| prevent any kind of long-term apples-to-apples comparison.
| You'll have changed your metrics engine, or your collector, or
| your tagging strategy, or your hosting strategy, or your
| containerization, or your deployments, or etc. etc. Even if you
| can _find_ the like metrics from last year, you can't trust
| that they meant the same thing then that they do now.
| liketochill wrote:
| In industrial settings where the process does not change but
| equipment is overhauled it can be quite valuable To compare
| year over year
| pachico wrote:
| I've been using VictoriaMetrics for years already and I must say
| it's simply perfect.
|
| It's not only about its performance but the entire ecosystem and
| the absolutely zero friction upgrade experience.
|
| It's so, so good.
|
| Happy to share notes if interested.
| pachico wrote:
| I just wanted to add my favourite feature which is how easy it
| is to have high availability.
|
| I run all my Prometheus in pairs, each sending all metrics to
| two VM on-prem (Hetzner) servers. Here's the magic, each VM
| server will dedup metrics (remember, they are all coming from
| two Prom instances). This way you have a simple and reliable HA
| long retention storage.
|
| I only deal with 500k active series but I'd bet I can deal with
| 10-20x more with the same setup.
| dedupnovice wrote:
| Can you explain more about the dedup feature? Does it require
| some kind of unique id from the source? Is it done on ingest
| or on query? Is that something included with VictoriaMetrics
| or a separate addon?
| pachico wrote:
| It's a VM built-in feature at ingestion time. In case two
| independent Prometheus instances send the same metric
| within the same time interval, only the first one will be
| stored.
|
| https://docs.victoriametrics.com/Single-server-
| VictoriaMetri...
| jmakov wrote:
| How does it compare to Clickhouse?
| valyala wrote:
| VictoriaMetrics is based on ClickHouse ideas, but is
| specifically optimized for storing and querying floating-point
| time series with arbitrary sets of (key=value) tags. Such time
| series are also known as metrics or measurements. See [1], [2]
| and [3] for more details.
|
| [1] https://medium.com/@valyala/how-victoriametrics-makes-
| instan...
|
| [2] https://www.youtube.com/watch?v=p9qjb_yoBro
|
| [3] https://faun.pub/victoriametrics-creating-the-best-remote-
| st...
___________________________________________________________________
(page generated 2022-09-09 23:01 UTC)