[HN Gopher] Monitoring Raspberry Pi Devices Using Telegraf, Infl...
___________________________________________________________________
Monitoring Raspberry Pi Devices Using Telegraf, InfluxDB and
Grafana
Author : ashtavakra
Score : 49 points
Date : 2021-08-26 17:04 UTC (5 hours ago)
(HTM) web link (blog.thecloudside.com)
(TXT) w3m dump (blog.thecloudside.com)
| liketochill wrote:
| It is strange that there isn't more overlap between tech software
| monitoring and metrics products and industrial historians and HMI
| products. Osisoft was purchased by Aveva/Schneider for 5 billion
| despite them already owning citect, wonderware, and probably 6
| other historian products.
|
| The industrial historians solve the same problems - collect data
| at nodes that might have intermittent connectivity, send to a
| centralized server/service that can handle lots of data, and
| allow users to plot it.
|
| I wonder if we'll start to see more open source monitoring on the
| factory floor. While it will be easy for a product to work as
| well as industrial offerings, maybe their value is in the long
| term support (usually close to a decade) and supporter upgrade
| paths.
| camtarn wrote:
| Yes, agreed. This is basically what we do at my work: plot
| stuff from PLCs onto Grafana, while coping with intermittent
| connectivity. I'd love to buy a ready-made product that could
| do it, but there's literally nothing out there that quite comes
| close.
|
| As well as the things you identified, I suspect that there's
| just a lot of mistrust of open source in the industrial world -
| there's that whole thing of perceived value being directly
| proportional to product cost, plus commercial vendors also tend
| to at least offer training and tech support, even if they're
| not always the most helpful.
| zwayhowder wrote:
| At my work (A university) We setup a TIG stack to monitor IT
| systems and then one of our facilities management people saw it
| and we got chatting. We have just setup a POC to pipe metrics
| out of multiple proprietary building management systems into a
| single Grafana dashboard.
|
| We've also updated all our tender documents for future projects
| to include a requirement that we can query metrics and logs
| through an API or direct DB access.
|
| A recent project I worked on identified over 500 applications
| whose only use is to provide monitoring to a bespoke system or
| tool. This isn't uncommon at a university as different
| faculties and departments will buy "the best tool for XYZ"
| without ever asking IT if perhaps there is a tool that is
| almost as good that we already have.
| jakozaur wrote:
| SaaS version of that would be use Telegraf, but send data to Sumo
| Logic, Data Dog or other observability vendor. I would also
|
| You don't need to host InfluxDb and Grafana yourself. I would
| also consider gathering logs and traces to troubleshoot problems.
| Straightforward with top tier observability vendors, harder to do
| it on your own.
|
| Disclaimer: I'm employee of Sumo Logic.
| JorgeGT wrote:
| Note that you can gather and display logs with this same stack,
| Telegraf includes a plugin to consume syslog output:
| https://github.com/influxdata/telegraf/blob/release-1.14/plu...
| and then you can do something like this on Grafana:
| https://grafana.com/api/dashboards/12433/images/9004/image
|
| The one thing I would add to this guide is enabling HTTPS for
| the whole stack, if you are transmitting over the public
| internet. Fortunately, it is quite straightforward (and free)
| with Let's Encrypt.
| alexwasserman wrote:
| Telegraf is an awesome agent, and also pairs really nicely with
| Prometheus as a TSDB, then you can put Grafana on top.
|
| Maybe not in this specific case, but in general Prometheus my
| preferred TSDB sitting between Telegraf and Grafana
|
| If you need further scale-out there are options for federating
| Prometheus instances as well.
| dijit wrote:
| If I have a regret in my observability stack I think it's got to
| be influxdb.
|
| I bought in to the TICK stack and planned on using an enterprise
| support contract when going to production, but every interaction
| with InfluxData the company has felt a bit sleazy. Trying to push
| very hard to the cloud offering for example.
|
| That's bad enough, but the documentation and observability of the
| database is quite poor, and it's trivially easy to "vanish" all
| your data and lock your instance up for hours or days by changing
| the retention policy of a database. (Not making it much
| different).
|
| Now of course it's not TICK at all. More like "TI" as kapacitor
| and chonograph (dashboarding and alerting respectively) are
| deprecated products and rolled in to the main offering.
|
| Added to that they completely changed the query language.
|
| I have to say; pick something better if you can. TimescaleDB or
| Prometheus (which uses openTSDB) are promising.
| jrockway wrote:
| I used 1.x for my push-monitoring stack at my last job. (For
| cases where "pull" is practical, I would always use Prometheus.
| Prometheus also has "push" now, by the way.) They went into 2.0
| mode and kind of neglected 1.x, and I kind of forgot about it.
| At the time, I was most familiar with an internal monitoring
| system at Google, and I found I couldn't do queries that I
| expected to be able to do. I even mentioned it on HN and some
| influx folks told me that what I wanted to do was too weird to
| support. (It's not. I was collecting byte counters from fiber
| CPEs, and wanted to have bandwidth charts based on topology
| tags I stored with the data -- imagine a SQL table like
| (serial_number text not null, time timestamp not null, locality
| text not null, bytes_sent int64 not null, bytes_received int64
| not null). The problem was that timestamps would not be aligned
| between records in the same locality group -- I sampled these
| occasionally throughout the day and not all at the same
| instant. And, they were counters, not deltas, so the query
| would have to do the delta across each serial number, and then
| aggregate across all devices in a locality. Very possible to
| do, I literally had that chart with the other monitoring
| system. But not possible with the influx v1 querying, as far as
| I could tell.)
|
| I set up 2.x for myself recently, and they have really done a
| lot of work. The OSS offering has most of the features that
| cloud/enterprise would. It was easy to set up -- they don't
| have any instructions for installing it in Kubernetes, and
| haven't updated their Helm charts for 2.x, but it was like 3
| minutes to write a manifest (https://github.com/jrockway/jrock.
| us/tree/master/production/...) myself, which I prefer 99.9% of
| the time anyway. The new query language is incredibly verbose,
| but I see the steps that I remember having with Google's
| internal system, align, delta, aggregate... all possible. (I
| had to scratch my head a lot, though, to make it work. And I
| really am not able to reason about what operations it's doing,
| what's indexed or not indexed, why I ingest my data as rows but
| process it as columns, etc.) The performance is good, and it
| worked well for my use case of pushing data from my Intranet of
| Stuff. Generally I like it and I don't think they are being
| shady in any way. It's on my list of something to set up at
| work to collect various pieces of time series data outside of
| the Prometheus ecosystem (CI runtimes, etc.).
|
| The reason I picked InfluxDB over TimescaleDB for my personal
| stuff is because InfluxDB has an HTTP API with built-in
| authentication. I already a ton of HTTP services exposed to the
| Internet, and I understand them well. (Yup, I have SSO and rate
| limiting and all that stuff for my personal projects ;) I can
| give each of my devices an API key from their web interface,
| and I make an HTTP request to write data. Very simple. (They
| have a client library, but honestly my main target is a
| Beaglebone, and it doesn't have enough memory to compile their
| client library. I've never seen "go build" run out of memory,
| but their client makes that happen. I shouldn't develop on my
| IoT device, of course, but it's just easier because it has
| Emacs and gopls, and all the sensors connected to the right
| bus. Was easier to just manually make the API calls than to
| cross-compile on my workstation and push the release build to
| the actual device.) TimescaleDB doesn't have that, because it's
| just Postgres. So I'd basically have to expose port 5432 to the
| world, create Postgres users for every device, generate a
| password, store that somewhere, etc. Then to ingest data, I'd
| connect to the database, tune my connection pool, retry failed
| requests manually, etc. Using HTTP gets me all that for free; I
| can just configure retries in Envoy.
|
| But... SQL queries are a lot easier to figure out than FluxQL
| queries, and I already have good tools for manipulating raw
| data in Postgres (DataGrip is my preferred method), so I think
| I will likely be revisiting TimescaleDB. Honestly, I'd pay for
| a managed offering right now if they had a button in Google
| Cloud Console that was "Create Instance and by the way this
| just gets added to your GCP bill for 10% more than a normal
| Cloud SQL instance".
| xupybd wrote:
| I'm intrigued by the need for what looks like server hardware
| monitoring that is needed in a vehicle?
___________________________________________________________________
(page generated 2021-08-26 23:01 UTC)