[HN Gopher] Grafana Mimir - Horizontally scalable long-term stor...
___________________________________________________________________
Grafana Mimir - Horizontally scalable long-term storage for
Prometheus
Author : devsecopsify
Score : 226 points
Date : 2022-03-30 13:13 UTC (9 hours ago)
(HTM) web link (grafana.com)
(TXT) w3m dump (grafana.com)
| eatonphil wrote:
| There isn't a link to the project on the page (that I could find)
| so it almost looked like it's not open source. But here it is:
| https://github.com/grafana/mimir.
| notamy wrote:
| You have to find the "Download" button and click it, it's very
| non-obvious :< The entire page seems to be designed to funnel
| you into signing up for their paid service, which makes sense,
| but still doesn't feel great...
| dewey wrote:
| The first CTA button on the page "Tutorial" links to a
| tutorial where the first step is to run the project with
| Docker. Doesn't really feel like an overly forced funnel to
| their paid service.
| candiddevmike wrote:
| Recently switched from their cloud service back to on-
| premise. The cloud version wasn't being updated and the
| entire setup experience left a lot to be desired with how you
| connect their on-premise grafana agent, especially if you
| aren't using their easy button deployment stuff. Also,
| billing for metrics is insane, as on any given day my metric
| load may vary between 5-7k or more. This caused some
| operational overhead as I was constantly tweaking scrapers to
| reduce useless metrics.
|
| For $50/mo, you can self host everything easier, cheaper and
| with more control IMO.
| maccard wrote:
| > For $50/mo, you can self host everything easier, cheaper
| and with more control IMO.
|
| Can you give an example as to how you could self host a
| grafana stack for $50/month? On AWS that buys you 4 cores,
| 8GB memory and 0 storage, and it's certainly not easier
| than clicking one button on the grafana website.
| jrwr wrote:
| Two low end Hetzner/OVH Boxes for redundancy should do
| the trick
| m1keil wrote:
| We are running Grafana and Prometheus on a single
| t3.xlarge instance with 150GB gp3 EBS.
|
| Excluding traffic, it costs ~ $100 USD per month.
|
| We are doing 10 second scrapes and currently have roughly
| 141k active time series. In Grafana Cloud it would
| cost...
|
| 15000 metrics for free. 126000/1000 * $8 = $882
|
| Now here's the real kicker.. the pricing Grafana puts on
| their website are assuming 60 second scrape interval (1
| data point/minute or DPM). If you are doing 6 DPM, that's
| $8 * 6 per 1000 time series!
|
| So final bill.. _drum rolls_
|
| 126000/1000 * $8 * 6 = $6048
|
| Yes. That's a 60x.
|
| Now, sure, we don't get the scale, the backups, the SLA..
| but we can live without it. And when Prometheus will
| start acting slowly, we will just bump it to t3.2xl, or
| spend some time and filter out some of the noisy metrics
| we might have around.
|
| Btw, if you try to find any information about what is a
| "time series" or a "metric" on the Grafana's pricing
| page, good luck.
|
| https://grafana.com/docs/grafana-cloud/metrics-control-
| usage...
| maccard wrote:
| > Excluding traffic, it costs ~ $100 USD per month
|
| I don't doubt that that's affordable, or cost competitive
| to AWS, but thats' about as cheap as you can do it, _and_
| that's not including traffic. It's pretty much impossible
| to half that bill.
| m1keil wrote:
| I excluded the traffic because the price is basically 0.
| This is internal traffic and a bunch of HTTP requests. It
| doesn't cost us $3000 a month.
| alexjplant wrote:
| There are Helm charts available for all Grafana products
| so if you already run a Kubernetes cluster and have spare
| capacity you can just throw it up there. Loki supports
| shipping logs to GCS/S3 natively and Prometheus can use
| Cortex (also available as a Helm chart) to do the same.
| Once you throw Grafana behind SSO and implement a backup
| cronjob you're done until you reach scale and have to
| start deploying/scaling individual components separately.
|
| I implemented most of the above using Terraform on a
| managed DigitalOcean cluster on a Saturday a few months
| back; it wasn't super-hard. Alternatively you could rent
| a few VPSes someplace and use k3s or similar to get an
| unmanaged cluster.
| westurner wrote:
| Suggestions for organizing a Helm + Terraform [+
| k3s/k3d/MicroShift] provisioning _and monitoring_ git
| repo with CI for job accounting? (without Ansible & AWX,
| which I'd create a role with for this too)
|
| - [ ] ENH,BLD: A cookiecutter for this would be cool
| krnlpnc wrote:
| > $50/month? On AWS that buys you 4 cores, 8GB memory and
| 0 storage
|
| Self-hosting on AWS is kind of counterproductive. Look
| into "cloud" metal servers and the money will go much
| further.
| mdaniel wrote:
| Still AGPL, which I guess makes sense given the rest of their
| stack is too:
| https://github.com/grafana/mimir/blob/mimir-2.0.0/LICENSE
| cfors wrote:
| More engineering effort going into reinventing things that
| already exist to upsell people on Grafana cloud.
|
| What about focusing on the core value that Grafana provides,
| dashboards?
|
| Grafana 8 alerting is still in my opinion at a beta level.
| Dashboards as code has made no meaningful progress outside of
| community attempts in the past 3 years. The documentation for
| Grafana 8 alerts is still subpar.
|
| All of these things as a paid offering are more interesting than
| migrating my logging system or metrics system. Developers don't
| want to migrate their observability.
| INTPenis wrote:
| What issues have you seen with Grafana alerting?
|
| I'm curious because in my view it works so well that we
| abandoned alertmanager for Grafana alerts only well before v8.
| darkwater wrote:
| How did you define alarms as code in a practical way before
| v8? and after?
| ArmandGrillet wrote:
| Hi, I work on Grafana Alerting. Provisioning of alert rules
| (and other objects used for alerting) will be possible
| using a new API in Grafana 8.5 and we will update the
| Grafana Terraform provider right after to take advantage of
| this new API.
| darkwater wrote:
| Great to hear! We are looking into jsonnet based approach
| but having an explicit and granular API and a Terraform
| provider would be miles and miles better. Thanks!
| INTPenis wrote:
| Did not tbh. We have an ops department that do not complain
| about menial tasks.
|
| But of course IaC is the way we must follow.
| cfors wrote:
| Building a dashboard by clickety/clacking around is not a
| menial task, consistency across dashboards is a a core
| unit of observability to ensure x-functional teams can
| discuss issues across a common language/viewpoint, which
| is only enforceable through a declarative dashboard
| syntax.
| INTPenis wrote:
| The question was regarding alerts, not dashboards. We
| obviously deploy dashboards from json.
|
| But I'm not aware of any way to deploy notification
| channels, probably can do that now via API. But either
| way we need to deploy notification channels with webhooks
| and tokens so that part is done manually. And then the
| alerts is also done manually.
| cfors wrote:
| Grafana alerts (before version 8) worked great. We use them,
| but the Grafana 8 alerting features are half-baked at best.
|
| * Grafana 8 alerts removed the Image Preview, which was
| extremely useful during issues. [0]
|
| * Grafana 8 alerts don't have any way of being stored as
| code. In fact the API that they provide in their docs [0][1]
| doesn't work, or isn't up to date.
|
| * The expression languages have zero documentation about
| them, so aren't exactly useful for things that might get a
| developer out of bed in the middle of the night.
|
| [0] https://github.com/grafana/grafana/discussions/38030#disc
| uss...
|
| [1] https://editor.swagger.io/?url=https://raw.githubusercont
| ent...
|
| [2] https://community.grafana.com/t/posting-an-alert-using-
| grafa...
| Bad_CRC wrote:
| No alerts possible with dashboards and variables.
| jchw wrote:
| Understandable critique, but I absolutely love a lot of
| Grafana's redundant offerings. For example, operationally
| speaking it is _drastically_ simpler to set up a scalable
| Grafana Tempo instance than Jaeger, in my opinion. Grafana
| offering competent object storage backends for their software
| has made them dramatically easier to operate and maintain.
|
| That's also another thing: a decent amount of Grafana software
| (Mimir, Loki, Tempo...) are OSS, so while they definitely are
| using those softwares in their paid offering, they absolutely
| still benefit OSS users. I'm messing with Tempo for telemetry
| in my (admittedly embarrassingly weak) home lab endeavors and
| it's pretty cool.
| CitizenKane wrote:
| Hey there! I work at Grafana on many of the dashboard
| components. Beyond dashboards as code and alerts where are you
| feeling the pain?
|
| I can say that a lot of effort is going into improving
| dashboards in a number of different dimensions and there are
| definitely some exciting things on the horizon.
| detaro wrote:
| Is there any competitor in the "primarily dashboards" space?
| Plenty things I know just use Grafana for small amounts of data
| where all this "5 new datastores!" isn't really useful, but
| dashboard improvements would be welcome.
| berkes wrote:
| Seconded. While I like the idea of Grafana, and use it for some
| projects, it lacks features in the graphing and dashboarding
| part. I too presumed this is because they are spending more on
| backends, pipelines and collection..
|
| I don't need more backends, pipelines or collections. I need a
| frontend to display the data that I have (in backends) already.
|
| I need to:
|
| * Be able to pipe KPIs into a storage. Doesn't need big-data,
| high-volume, or extreme granularity. OR
|
| * Have grafana grab data from an API/HTTP endpoint. It does
| this with prometheus just fine.
|
| * Have a way to insert some of my own figures. Currently I wire
| up some google-sheet to grafana and fill that. I always have
| some data that I cannot or will not (yet) grab automatically.
| Like "amount of hours spent working on project" or "MRR" or
| such.
|
| Its possible with Grafana. But the experience is subpar, the
| tweaking and wiggling is big and the outcome is an OK-ish, but
| not too convincing dashboard. I'm convinced an alternative that
| tackles this better (for niches) will eat into grafana.
| cett wrote:
| Presumably AGPLv3 is why Grafana would rather develop this than
| Cortex?
| pracucci wrote:
| Hi. I'm Marco, I work at Grafana Labs and I'm a Grafana Mimir
| maintainer. We just published a couple of blog posts about the
| project, including more details on your question:
| https://grafana.com/blog/2022/03/30/announcing-grafana-mimir...
| and https://grafana.com/blog/2022/03/30/qa-with-our-ceo-about-
| gr...
| cett wrote:
| Thank you for your answer. That seems like a reasonable
| strategy.
| MindTooth wrote:
| How does this compare to https://www.timescale.com/promscale
|
| I'm looking into choosing a backend for my metrics and always
| open for suggestions.
| vineeth0297 wrote:
| Hey!
|
| Promscale PM here :)
|
| Promscale is the open source observability backend for metrics
| and traces powered by SQL.
|
| Whereas Mimir/Cortex is designed only for metrics.
|
| Key differences:
|
| 1. Promscale is light in architecture as all you need is
| Promscale connector + TimescaleDB to store and analyse metrics,
| traces where as Cortex comes with highly scalable micro-
| services architecture this requires deploying 10's of services
| like ingestor, distributor, querier, etc.
|
| 2. Promscale offers storage for metrics, traces and logs (in
| future). One system for all observability data. whereas the
| Mimir/Cortex is purpose built for metrics.
|
| 3. Promscale supports querying the metrics using PromQL, SQL
| and traces using Jaeger query and SQL. whereas in Cortex/Mimir
| all you can use is PromQL for metrics querying.
|
| 4. The Observability data in Cortex/Mimir is stored in object
| store like S3, GCS whereas in Promscale the data is stored in
| relational database i.e. TimescaleDB. This means that Promscale
| can support more complex analytics via SQL but Cortex is better
| for horizontal scalability at really large scales.
|
| 5. Promscale offers per metric retention, whereas Cortex/Mimir
| offers a global retention policy across the metrics.
|
| I hope this answers your question!
| pracucci wrote:
| Hi. I'm a Mimir maintainer. I don't have hands-on/production
| experience with Promscale, so I can't speak about it. I'm
| chiming in just to add a note about the Mimir deployment
| modes.
|
| > Cortex comes with highly scalable micro-services
| architecture this requires deploying 10's of services like
| ingestor, distributor, querier, etc.
|
| Mimir also supports the monolithic deployment mode. It's
| about deploying the whole Mimir as a single unit (eg. a
| Kubernetes StatefulSet) which you then scale out adding more
| replicas.
|
| More details here:
| https://grafana.com/docs/mimir/latest/operators-
| guide/archit...
| tarun_anand wrote:
| Thanks... how do we do reporting/dashboards/alerts with
| Promscale?
|
| Also, any performance benchmarks?
| vineeth0297 wrote:
| Promscale supports reporting/ingestion of data using
| Prometheus remote-write for metrics, OTLP (OpenTelemetry
| Line Protocol) for traces.
|
| Dashboards you can use Promscale as Prometheus datasource
| for PromQL based querying, visualising, as Jaeger
| datasource for querying, visualising traces and as
| PostgreSQL datasource to query both metrics and traces
| using SQL. If you are interested in visualising data using
| SQL, we recently published a blog on visualising traces
| using SQL (https://www.timescale.com/blog/learn-
| opentelemetry-tracing-w...)
|
| Alerts needs to be configured on the Prometheus end,
| Promscale doesn't support alerting at the moment. But
| expect the native alerting from Promscale in the upcoming
| releases.
|
| We have internally tested Promscale at 1Mil samples/sec,
| here is the resource recommendation guide for Promscale htt
| ps://docs.timescale.com/promscale/latest/installation/rec..
| .
|
| If you are interested in evaluating, setting up Promscale
| reach out to us in Timescale community
| slack(http://slack.timescale.com/) in #promscale channel.
| Thaxll wrote:
| So many solutions to the same problem, how does it compare to
| Victoria Metrics?
| hagen1778 wrote:
| VictoriaMetrics co-founder here.
|
| There are many similar features between Mimir and
| VictoriaMetrics: multi-tenancy, horizontal and vertical
| scalability, high availability. Features like Graphite and
| Influx protocols ingestion, Graphite query engine are already
| supported by VictoriaMetrics. I didn't find references to
| downsampling in Mimir's docs, but I believe it supports it too.
|
| There are architectural differences. For example, Mimir stores
| last 2h of data in local filesystem (and mmaps it, I assume)
| and once in 2h uploads it to the object storage (long-term
| storage). VictoriaMetrics doesn't support object storage and
| prefers to use local filesystem for the sake of query speed
| performance. Both VictoriaMetrics and Mimir can be used as a
| single binary (Monolithic mode in Mimir's docs) and in cluster
| mode (Microservices mode in Mimir's docs). The set of cluster
| components (microservices) is different, though.
|
| It is hard to say something about ingestion and query
| performance or resource usage so far. While benchmarks from the
| project owners can be 100% objective, I hope community will
| perform unbiased tests soon.
| outsb wrote:
| Given Victoria Metrics is the only solution I've seen to make
| data comparing it to other systems easily accessible as part of
| official documentation, it's the only one I pay attention to.
|
| I knew from reading the docs what VM excelled at and areas it
| was weak in, long before I ever ran it (and expectations from
| running it matched the documentation). I hate aspirational
| marketing-saturated campaigns for deep tech projects where
| standards should obviously be higher, it speaks more about
| intended audience than it does the solution, and that's why in
| this respect VM is automatically a cut above the rest.
| cip01 wrote:
| Cortex, Thanos and Mimir all support "remote-read" protocol
| (documented in Prometheus: https://prometheus.io/docs/prometh
| eus/latest/storage/#remote...), so external systems (eg
| Prometheus) can read data from them easily.
| valyala wrote:
| It would be great if you could provide a few practical
| examples for "Prometheus remote-read" protocol given its'
| restrictions [1].
|
| [1] https://github.com/prometheus/prometheus/issues/4456
| cip01 wrote:
| Which restrictions do you have in mind?
|
| Quick look at the issue looks like it wanted to avoid
| using local storage by Prometheus, but that's Prometheus
| specific problem, not remote-read problem.
|
| Remote-read is a generic protocol (https://github.com/pro
| metheus/prometheus/blob/a1121efc18ba15...), you pass
| query (start/end time and matchers), and get back data.
| halfmatthalfcat wrote:
| How does this stack up with https://github.com/thanos-io/thanos,
| which I've used to pretty good success.
|
| The only criticism I have of Thanos though was the amount of
| moving pieces to maintain.
| netingle wrote:
| (Tom here; I started the Cortex project on which Mimir is based
| and lead the team behind Mimir)
|
| Thanos is an awesome piece of software, and the Thanos team
| have done a great job building an vibrant community. I'm a big
| fan - so much so we used Thanos' storage in Cortex.
|
| Mimir builds on this and makes it even more scalable and
| performance (with a sharded compactor and query engine). Mimir
| is multitenant from day 1, whereas this is a relatively new
| thing in Thanos I believe. Mimir has a slightly different
| deployment model to Thanos, but honestly even this is
| converging.
|
| Generally: choosing Thanos is always going to be a good choice,
| but IMO choosing Mimir is an even better one :-p
| AndyNemmity wrote:
| Okay, but why? I am using Thanos today. It works, it's
| complex, when it breaks, it's a bit of a challenge to fix,
| but it happens. It doesn't break often.
|
| It does the job. Mimir, which is based on Cortex, using
| either Mimir, or Cortex, what benefit am I getting?
|
| I get asked every few months about moving off of Thanos to
| Cortex, and today now Mimir, and I don't have any substantial
| reason to do so. It feels like moving for the sake of moving.
|
| I need to see some real reasoning as to why I am going to add
| value to move everything to Mimir.
| netingle wrote:
| Sounds like Thanos is working well for you, so in your
| position I wouldn't change anything.
|
| There are a bunch of other reasons why people might choose
| Mimir; perhaps they have out grown some of the scalability
| limits, or perhaps they want faster high cardinality
| queries, or a different take on multi-tenancy.
|
| Do remember Cortex (on which Mimir is based) predates
| Thanos as a project; Thanos was started to pursue a
| different architecture and storage concept. Thanos storage
| was clearly the way forward, so we adopted it. The
| architectures are still different: Thanos is "edge"-style
| IMO, Mimir is more centralised. Some people have a
| preference for one over the other.
| AndyNemmity wrote:
| That's fair, thanks for the input. The only reason we
| implemented Thanos in the first place was a particular
| feature that we needed at the time of implementation. Now
| using it in an extremely large environment, I haven't
| seen any scalability limits. Speed of queries isn't a
| driver of anything.
|
| Multi Tenancy certainly is, but we have our own custom
| multi tenancy solution over top of it we built ourselves.
| I'd like to get rid of that ultimately, but we're not
| utilizing whatever multi tenant features exist at the
| moment. Perhaps that will be a driver.
|
| Appreciate your thoughts.
| notacoward wrote:
| Multi-tenancy is something that shouldn't be underestimated.
| A lot of people think it's just a checklist item until (a)
| they need it or (b) they try to implement it in an existing
| system. Kudos for making it a day-one feature.
| vladvasiliu wrote:
| While I agree with your point in the general case, would
| you mind elaborating on the specific case of Prometheus?
|
| My understanding is that the recommended best-practice for
| Prometheus is to deploy as many of them as necessary, as
| close to the monitored infrastructure as possible.
|
| What use case would require deploying a single Mimir, so
| supposedly Prometheus (cluster) in the case of serving
| multiple tenants? Why not just deploy a dedicated
| Prometheus / Mimir stack per client?
| pracucci wrote:
| Mimir has a microservices architecture. However, Mimir supports
| two deployment modes: monolithic and microservices.
|
| In monolithic mode you deploy Mimir as a single process and all
| microservices (Mimir components) run inside the same process.
| Then you scale it out running more replicas. Deployment modes
| are documented here:
| https://grafana.com/docs/mimir/latest/operators-guide/archit...
| witcher wrote:
| (Bartek here: I co-started Thanos and maintain it with other
| companies)
|
| Thanks for this - it's a good feedback. It's funny you
| mentioned that, because we actively try to reduce the number of
| running pieces e.g while we design our query sharding
| (parallelization) and pushdown features.
|
| As Cortex/Mimir shows it's hard - if you want to scale out
| every tiny functionality of your system you end up with twenty
| different microservices. But it's an interesting challenge to
| have - eventually it comes to trade-offs we try to make in
| Thanos between simplicity, reliability and cost vs ultra max
| performance (Mimir/Cortex).
| mgarciaisaia wrote:
| The thing I need most right now is a confirmation that it's named
| after this tweet:
| https://twitter.com/mmoriqomm/status/1272552214658117638
| nosequel wrote:
| Grafana Labs needs to make a convincing comparison chart of some
| kind between Mimir, Thanos, and Cortex. Thanos and Cortex are
| both mature projects and are both CNCF Incubating projects. Why
| would anyone switch to a new prometheus long-term storage
| solution from those?
|
| _*EDIT*_ : I see from another reply there is a basic comparison
| to Cortex here: https://grafana.com/blog/2022/03/30/announcing-
| grafana-mimir... To the Mimir folks, I'd love to see something
| similar Mimir v. Thanos.
| mekster wrote:
| You're forgetting VictoriaMetrics that's presumably the best
| choice for Prometheus long term storage.
|
| Such a solid solution exists and yet another competitor? Not
| sure why they didn't just buy VictoriaMetrics and possibly
| rebrand it.
| fishpen0 wrote:
| > Cortex is used by some of the world's largest cloud providers
| and ISVs, who are able to offer Cortex at a lower cost because
| they do not invest the same amount in developing the project.
|
| > ...
|
| > All CNCF projects must be Apache 2.0-licensed. This
| restriction also prevents us from contributing our improvements
| back to Cortex.
|
| I read this as "Amazon has destroyed the CNCF by not playing
| nice"
| CameronNemo wrote:
| Holy crap I did not know CNCF discriminated against copyleft
| software.
|
| This really discredits the Linux Foundation as an
| institution.
| netingle wrote:
| I agree! Which is why I put one in the blog post ;-)
| https://grafana.com/blog/2022/03/30/announcing-grafana-mimir...
| krnlpnc wrote:
| I'm not seeing a comparison to Thanos
| alrlroipsp wrote:
| Why would you? Parent says its a comparison of Mimir and
| Cortex.
| krnlpnc wrote:
| Re-read the full thread...
|
| >>Grafana Labs needs to make a convincing comparison
| chart of some kind between Mimir, Thanos, and Cortex.
|
| >I agree! Which is why I put one in the blog post ;-)
| sciurus wrote:
| It looks like this is a fork of Cortex driven by the
| maintainers employed by Grafana Labs, done so they can change
| the license to one that will prevent cloud providers like
| Amazon from offering it without contributing changes back.
|
| This is interesting, since Amazon offers both hosted Grafana
| and Cortex today. I was under the impression Amazon and Grafana
| Labs were successfully collaborating (unlike e.g. AWS and
| Elastic), but seems like that's not the case.
| WraithM wrote:
| Does AWS provide managed Cortex? Is that just a part of the
| AWS managed prometheus thing?
| sciurus wrote:
| Yes, Amazon's managed Prometheus is based on Cortex. See
| the first question at
| https://aws.amazon.com/prometheus/faqs/
| eatonphil wrote:
| It's hard to tell exactly how this works but judging from the
| tutorial's docker-compose.yml [0] it looks like this runs as a
| separate API next to Prometheus and you tell Prometheus to write
| [1] to Mimir. I'm unclear how reads work from it or maybe there
| is no read?
|
| Maybe I'm completely misunderstanding.
|
| [0]
| https://github.com/grafana/mimir/blob/main/docs/sources/tuto...
|
| [1]
| https://github.com/grafana/mimir/blob/main/docs/sources/tuto...
| pracucci wrote:
| Mimir exposes both remote write API and Prometheus compatible
| API. The typical setup is that you configure Prometheus (or
| Grafana Agent) to remote write to Mimir and then you configure
| Grafana (or your preferred query tool) to query metrics from
| Mimir.
|
| You may also be interested into looking at a 5 minutes
| introduction video, where I cover the overall architecture too:
| https://www.youtube.com/watch?v=ej9y3KILV8g
| eatonphil wrote:
| Cool! Personally I don't like watching videos, preferring to
| read prose or code or see an arch diagram. But good that it's
| available.
| pracucci wrote:
| I'm the author of the video, but personally I also prefer
| to read prose instead of watching videos!
|
| The architecture is covered here:
| https://grafana.com/docs/mimir/latest/operators-
| guide/archit...
|
| There's also an hands-on tutorial here:
| https://grafana.com/tutorials/play-with-grafana-mimir/
| bboreham wrote:
| It's a centralised multi-tenant store, supporting the
| Prometheus query API. So you can point clients directly at
| Mimir, they send in PromQL and they get data back in Json.
|
| (Note I work on Mimir)
| eatonphil wrote:
| Is there an example of running mimir without prometheus?
| bboreham wrote:
| For example sending metrics from an OpenTelemetry pipeline.
|
| Mimir accepts the Prometheus remote-write api, which is
| protobuf-over-http; can be generated by anything really.
| k8sToGo wrote:
| But who does the scraping of the prometheus agents? Mimir or
| still prometheus server?
| Duologic wrote:
| Last year I wrote a blog post about this exact question:
| Who watches the watchers?
|
| The general takeaway is that you run a minimal
| prometheus/alertmanager setup that only scrapes the agents,
| then use a dead man switch-like system to ensure this
| pipeline keeps working.
|
| Link: https://grafana.com/blog/2021/04/08/how-we-use-
| metamonitorin...
| bboreham wrote:
| If you have systems exporting metrics in Prometheus style,
| then you can use Prometheus to scrape them and remote-write
| to Mimir.
|
| You can alternately use Prometheus Agent, to save storing
| the data and running a query engine at the leaf.
|
| You can also use the OpenTelemetry suite to perform the
| same operation, though this is more appealing if you want
| some other OpenTelemetry features at the same time. Eg if
| you prefer the 'pipeline' style.
| inkel wrote:
| You configure with Remote Write [1] to the Mimir instance.
| Then the Prometheus agents will send the metrics to Mimir.
|
| 1: https://prometheus.io/docs/prometheus/latest/configurati
| on/c...
| SuperQue wrote:
| One interesting question I have is regards to global
| availability.
|
| With our current Thanos deployment, we can tie a single geo
| regional deployment together with a tiered query engine.
|
| Basically like this:
|
| "Global Query Layer" -> "Zone Cluster Query Layer" -> "Prom
| Sidecar / Thanos Store"
|
| We can duplicate the "Global Query Layer" in multiple geo regions
| with their own replicated Grafana instances. If a single
| region/zone has trouble we can still access metrics in other
| regions/zones. This avoids Thanos having any SPoFs for large
| multi-user(Dev/SRE) orgs.
| bboreham wrote:
| The typical way to run Mimir is centralised, with different
| regions/datacenters feeding metrics in to one place. You can
| run that central system across multiple AZs.
|
| If you run Mimir with an object store (e.g. S3) that supports
| replication then you can have copies in multiple geographies
| and query them, but the copies will not have the most recent
| data.
|
| (Note I work on Mimir)
| ddon wrote:
| Looks like an interesting alternative to Clickhouse with s3
| backend...
| monstrado wrote:
| Is this the project you guys referenced using Apache Arrow for?
| bboreham wrote:
| Maybe you're thinking of this - the data structure used by
| datasources for Grafana dashboards:
|
| https://grafana.com/docs/grafana/latest/developers/plugins/d...
| netingle wrote:
| I don't think so! I think thats being used in Tempo, but I'm
| not sure.
| sriv1211 wrote:
| What's the latency between sending a metric and being able to
| query it when using object storage (s3) instead of block storage?
|
| How do the transfer/retrieval (GET/PUT) costs factor in as well?
| pracucci wrote:
| Good question! Grafana Mimir guarantees read-after-write. If a
| write request succeed, the metric samples you've written are
| guaranteed to be queried by any subsequent query.
|
| Mimir employes write deamplification: it doesn't write
| immediately to the object storage but keeps most recently
| written data in-memory and/or local disk.
|
| Mimir also employes several shared caches (supports Memcached)
| to reduce object storage (S3) access as much as possible.
|
| You can learn more here in the Mimir architecture
| documentation: https://grafana.com/docs/mimir/latest/operators-
| guide/archit...
| young_unixer wrote:
| Coincidentally, "mimir" is a funny, baby-like way of saying
| "dormir" (to sleep) in Spanish.
| estebarb wrote:
| Technical meetings are going to be fun with hispanic devs...
|
| "And finally we sent the metrics to Mimir /giggles/"
|
| Sadly they don't support encryption at rest (sorry, I really
| had to do one more pun)
| vladsanchez wrote:
| So true!!! LOL I related to "Vamos a mimir!" when I read it!!!
| ROFL
| bbu wrote:
| i don't get why there's so much hate here.
|
| cortex is a pain to configure and maintain. would be awesome to
| have mimir address these issue!
| jhoechtl wrote:
| What is the relationship to Loki?
| bboreham wrote:
| Sibling. Much of the architecture is similar; a number of
| components are shared in https://github.com/grafana/dskit.
| firstSpeaker wrote:
| How does it work with Rules? So far I cannot see if this can be a
| replacement for prometheus since I cannot see how can we re-use
| our prometheus rules with Mimir. Anyone knows anything around
| that?
| pracucci wrote:
| Mimir includes a ruler component, which is responsibile to
| evaluate Prometheus recording and alerting rules. It also
| exposes a set of APIs to configure the rule groups.
|
| For example, you can use this API to upload a rule group:
| https://grafana.com/docs/mimir/latest/operators-guide/refere...
|
| Mimir is released with a CLI tool called "mimirtool" which,
| among other things, allow you to configure the rule groups
| (under the hood, it calls the Mimir API). Mimirtool
| documentation is here:
| https://grafana.com/docs/mimir/latest/operators-guide/tools/...
| dikei wrote:
| Sad news for Cortex, with most of the maintainer moving on to
| Mimir, I fear it's pretty much dead in the water.
| AndyNemmity wrote:
| If anything, this makes me less interested in moving from
| Thanos.
| netingle wrote:
| We tried to address this question on the Q&A blog post:
| https://grafana.com/blog/2022/03/30/qa-with-our-ceo-about-gr...
|
| It doesn't have to mean the end for Cortex, but others will
| have to step up to lead the project. We've tried to put other
| maintainers in place to kick start this.
| sciurus wrote:
| I was going to ask what the migration path was from Cortex to
| Mimir, but I see you've documented that at
| https://grafana.com/docs/mimir/latest/migration-
| guide/migrat... . Thanks for the work you've done to make
| this easy.
| pracucci wrote:
| This video also shows a live migration from Cortex to Mimir
| (running in Kubernetes): https://www.youtube.com/watch?v=aa
| GxTcJmzBw&ab_channel=Grafa...
| misiti3780 wrote:
| What is the best SASS based dashboard solution for Prometheus?
| heinrichhartman wrote:
| Grafana Cloud
| misiti3780 wrote:
| thanks
| [deleted]
___________________________________________________________________
(page generated 2022-03-30 23:01 UTC)