[HN Gopher] Hosted Monitoring: Evaluating InfluxDB Cloud and Gra...
___________________________________________________________________
Hosted Monitoring: Evaluating InfluxDB Cloud and Grafana Cloud
Author : pqvst
Score : 69 points
Date : 2021-06-23 09:28 UTC (13 hours ago)
(HTM) web link (pqvst.com)
(TXT) w3m dump (pqvst.com)
| sofixa wrote:
| I feel like InfluxDB were the best time series database in town
| for some time (2015-2018?), but they failed to monetise and
| exploit the space properly, and now they have lots of
| competition, many of it better in a lot of cases ( e.g. speed,
| space use, ease of use, pricing, integrations), be it for self-
| hosted or cloud.
|
| Some of it was wasted effort ( Cheonograf is still useless
| compared to Grafana, who thought it's a good idea to compete with
| it?), some was poor luck, some was decisions that backfired (
| like removing clustering from the free version). Some of it was
| everyone deciding pull is better than push ( i disagree in most
| cases, but see the point).
|
| Telegraf still remains a very solid collecting agent and IMHO the
| best in class. With everyone (from the SaaS monitoring vendors )
| moving to OpenTelemetry i think that in some time it'll fall
| behind as well.
| karterk wrote:
| InfluxDB could have become Grafana if they had focussed better.
| It was a breathe of fresh in a space where existing solutions
| were horrendously complicated. I suspect that they were under a
| lot of pressure to monetize and lost some way in attempting to
| do that.
| KaiserPro wrote:
| Telegraf's USP is that its an anything to anything converter.
| Which is invaluable when you are supporting a whole bunch of
| mixed stuff.
|
| I'm not sure OpenTelemtry is a replacement for actual designed
| metrics though. Its great for getting info about the state of
| an instance of something. but thats only half the picture.
| Building meaningful alerts of just raw telemetry is hard.
| heipei wrote:
| Telegraf is the main reason I'm still running the Telegraf +
| InfluxDB + Grafana stack. I might have checked out TimescaleDB
| or others, but since Telegraf works great with InfluxDB and has
| all sorts of arcane collection plugins, mostly with sensible
| options, I'm staying put. Good luck replicating the years of
| community effort to build out all these collectors to anyone
| who wants to replace it.
| dharmab wrote:
| The Prometheus community has done a great job covering a lot
| of common software and data sources:
| https://prometheus.io/docs/instrumenting/exporters/
| [deleted]
| KaiserPro wrote:
| Old man opinion here:
|
| I really like grafana, personally I'd probably host it myself
| even now. (At all previous companies I did.) Its well contained
| and even with alerting its simple to set up and sync state
| (assuming you are using a hosted database, which can be
| expensive)
|
| Where my opinion gets controversial: I dislike Prometheus, and I
| really don't like influx. I much prefer graphite.
|
| Influx has higher temporal resolution, and an SQL interface (its
| the SQL interface I don't like)
|
| Prometheus has a pull model, which I can see the appeal of, but I
| don't really like. But it deals with ephemeral metrics much
| better (however you need a gateway to cache the metrics, as they
| might not get pulled in time.)
|
| The issue is that graphite's interface is much better for
| discovering and filtering/transforming your data. I'm not the
| worlds greatest SQL person, so I can't do that much complex
| stuff, I also don't know how to do maths in SQL well. With
| graphite I don't need to worry, all the functions are there in
| menu, derivative? yes, all the averaging? mostDeviant? yes.
|
| Graphite made making graphs quick and easy, I don't think I can
| say the same for influx or prometheus. I know that all sorts of
| non technical people can use graphite/grafana. I'm less convinced
| that they can use Prometheus or influx as well without training.
| wperron wrote:
| > Prometheus has a pull model, which I can see the appeal of,
| but I don't really like.
|
| That's such an antiquated statement at this point. Prometheus
| might have been pull-only in a distant past, but the remote
| write protocol has been alive and well for a long time now, and
| it works amazingly well, at surprising scale.
| sofixa wrote:
| > Influx has higher temporal resolution, and an SQL interface
| (its the SQL interface I don't like)
|
| SQL was too limited, that's why they did Flux, their new query
| language. It's weird and has a learning curve, but powerful.
| gcbirzan wrote:
| SQL is not limited, InfluxQL is. And, flux is incredibly
| verbose and error prone (at least a year ago when we last
| looked at it). However, with the transformations in grafana,
| there's no reason with flux, in the majority of cases.
| jstrong wrote:
| influxql is limited, yes, but much less verbose for time-
| based aggregation than the equivalent sql queries, which is
| why I like using it for adhoc visualization and
| inspectibility. (I don't like flux). to me it's great for a
| limited purpose. writing the equivalent sql queries would
| drive me up the wall.
| KaiserPro wrote:
| > It's weird and has a learning curve, but powerful.
|
| I don't doubt that, but teaching it to business analysts,
| product owners and the like is a real headache.
|
| I'm sure I would grow to love it, but then it would require
| me to make dashboards for other people. Getting them to make
| them themselves is living the dream
| kungito wrote:
| It honestly doesn't feel powerfull at all. I admit, I have
| started using Influx 2.0 during beta, but the experience has
| been horrible. I first tried to write all my logic in Influx
| and have now over few months moved everything back into Rust
| with just lines which go directly into graphs staying in
| influx. The language as it is is ok, it's very basic, but the
| runtime and the development environment is very unfinished.
| So many internal errors whatever you do, the responses are
| slow to be received because I guess so few people try to
| write anything more complicated in Flux, the debugging is
| very hard. You can't even print out single values, everything
| has to be displayed in graphs. They support using arrays and
| single values but you just get convoluted internal errors and
| compiler panics if you don't write correctly in a language
| without explicit typing.
| Proven wrote:
| I completely agree, especially on the usability of Graphite.
| Casual users can create own dashboards - just get the data into
| Graphite.
|
| InfluxDB has some additional advantages, but they meed medium
| to large scale or advanced use to matter.
|
| For my use case (smaller, simple environment), Graphite is
| great.
| pqvst wrote:
| When we were self-hosting Grafana with an influx data store
| writing queries was actually very easy since Grafana provides a
| "query editor" to generate the SQL for Influx. Ironically that
| was much easier to use than InfluxDB Cloud's own solution
| (which now uses a completely different query language).
|
| Grafana Cloud seems to mostly advertise the pull model, whereas
| we were really looking for a push model. I'm glad I figured out
| how to get that to work with Graphite, since Grafana Cloud
| offers both Prometheus and Graphite.
|
| Now given that Grafana Cloud's free-tier covers our needs I'm
| glad we no longer have to maintain a self-hosted version (and
| we're definitely getting better performance now).
| KaiserPro wrote:
| I should have written that I fully understand why you'd want
| to sling it out and be hosted by someone else.
|
| I'm a devop/sre/pe/sysadmin so I have a motivation to keep
| metrics in my firm grasp. Given the price point and
| performance, for small organizations it'd be madness to _Not_
| get grafana cloud to host it.
| heipei wrote:
| I use Grafana religiously every day, rely on it for metrics,
| long-term trends and alerting. I also use Telegraf with
| InfluxDB, but thanks to Grafana I've never had to write a
| single InfluxDB statement.
|
| Also: It's not just higher temporal resolution, it's having an
| actual time-series database vs a metrics database. With
| InfluxDB (and all other event-based databases) I can go in
| after the fact and just plot the CPU load for one specific
| host, or a group of hosts, or plot the HTTP requests that took
| longer than 200ms, hit a specific endpoint and finished with a
| specific HTTP status code. If you don't keep the raw
| measurements you always have to make decisions about which
| groups you aggregate down to ahead of time. Just something to
| keep in mind.
| [deleted]
| gouthamve wrote:
| Hey, you can also point your Telegraf directly at GrafanaCloud
| Prometheus if you're interested in using Prometheus. See:
| https://grafana.com/docs/grafana-cloud/how-do-i/push-from-te...
|
| Our Graphite hosting is also best in class and this makes me
| happy to see the good feedback!
|
| PS: I work on Grafana Cloud Prometheus.
| hagen1778 wrote:
| I really love Grafana for visualizing metrics and can't think of
| anything with the same level of usability. It was obvious that
| InfluxDB Cloud UI can't compete with Grafana in this field.
|
| From the article I noticed it was tricky to switch from Push to
| Pull model. I'd recommend to try victoriametrics for your case
| due to the following reasons:
|
| 1. it supports Graphite ingestion protocol [1]
|
| 2. it supports Influx line ingestion protocol [2]
|
| 3. and besides of MetricsQL it also allows to query Graphite data
| [3]
|
| So you might need to make less changes if switching to VM instead
| of Grafana Cloud. Also, you can continue to run it on DO $5
| droplet.
|
| [1] https://docs.victoriametrics.com/#how-to-send-data-from-
| grap...
|
| [2] https://docs.victoriametrics.com/#how-to-send-data-from-
| infl...
|
| [3] https://docs.victoriametrics.com/#querying-graphite-data
| mikojan wrote:
| InfluxDB Cloud might be the single best solution if what you want
| is a managed service that just works somehow.
|
| As a time series database however it is rather painful.
|
| You can't have InfluxDB only. It comes integrated with the
| complete ecosystem. For example, it exposes an app/web server by
| default which you can't disable.
|
| Flux DSL is decent but so was SQL and I already knew SQL.
|
| Shipping a database does not appear to be topping their priority
| list and it's kind of irritating. A year into the InfluxDB 2
| release you weren't able to delete data. Now you can, but you
| can't be sure when that's going to actually happen. And there's
| no way of monitoring/managing currently running queries. Add to
| that its massive memory consumption and you're left wondering
| what other database solution would get away with this.
|
| For personal projects I have since moved to IoTDB
___________________________________________________________________
(page generated 2021-06-23 23:03 UTC)