[HN Gopher] Monitoring Raspberry Pi Devices Using Telegraf, Infl...
       ___________________________________________________________________
        
       Monitoring Raspberry Pi Devices Using Telegraf, InfluxDB and
       Grafana
        
       Author : ashtavakra
       Score  : 49 points
       Date   : 2021-08-26 17:04 UTC (5 hours ago)
        
 (HTM) web link (blog.thecloudside.com)
 (TXT) w3m dump (blog.thecloudside.com)
        
       | liketochill wrote:
       | It is strange that there isn't more overlap between tech software
       | monitoring and metrics products and industrial historians and HMI
       | products. Osisoft was purchased by Aveva/Schneider for 5 billion
       | despite them already owning citect, wonderware, and probably 6
       | other historian products.
       | 
       | The industrial historians solve the same problems - collect data
       | at nodes that might have intermittent connectivity, send to a
       | centralized server/service that can handle lots of data, and
       | allow users to plot it.
       | 
       | I wonder if we'll start to see more open source monitoring on the
       | factory floor. While it will be easy for a product to work as
       | well as industrial offerings, maybe their value is in the long
       | term support (usually close to a decade) and supporter upgrade
       | paths.
        
         | camtarn wrote:
         | Yes, agreed. This is basically what we do at my work: plot
         | stuff from PLCs onto Grafana, while coping with intermittent
         | connectivity. I'd love to buy a ready-made product that could
         | do it, but there's literally nothing out there that quite comes
         | close.
         | 
         | As well as the things you identified, I suspect that there's
         | just a lot of mistrust of open source in the industrial world -
         | there's that whole thing of perceived value being directly
         | proportional to product cost, plus commercial vendors also tend
         | to at least offer training and tech support, even if they're
         | not always the most helpful.
        
         | zwayhowder wrote:
         | At my work (A university) We setup a TIG stack to monitor IT
         | systems and then one of our facilities management people saw it
         | and we got chatting. We have just setup a POC to pipe metrics
         | out of multiple proprietary building management systems into a
         | single Grafana dashboard.
         | 
         | We've also updated all our tender documents for future projects
         | to include a requirement that we can query metrics and logs
         | through an API or direct DB access.
         | 
         | A recent project I worked on identified over 500 applications
         | whose only use is to provide monitoring to a bespoke system or
         | tool. This isn't uncommon at a university as different
         | faculties and departments will buy "the best tool for XYZ"
         | without ever asking IT if perhaps there is a tool that is
         | almost as good that we already have.
        
       | jakozaur wrote:
       | SaaS version of that would be use Telegraf, but send data to Sumo
       | Logic, Data Dog or other observability vendor. I would also
       | 
       | You don't need to host InfluxDb and Grafana yourself. I would
       | also consider gathering logs and traces to troubleshoot problems.
       | Straightforward with top tier observability vendors, harder to do
       | it on your own.
       | 
       | Disclaimer: I'm employee of Sumo Logic.
        
         | JorgeGT wrote:
         | Note that you can gather and display logs with this same stack,
         | Telegraf includes a plugin to consume syslog output:
         | https://github.com/influxdata/telegraf/blob/release-1.14/plu...
         | and then you can do something like this on Grafana:
         | https://grafana.com/api/dashboards/12433/images/9004/image
         | 
         | The one thing I would add to this guide is enabling HTTPS for
         | the whole stack, if you are transmitting over the public
         | internet. Fortunately, it is quite straightforward (and free)
         | with Let's Encrypt.
        
       | alexwasserman wrote:
       | Telegraf is an awesome agent, and also pairs really nicely with
       | Prometheus as a TSDB, then you can put Grafana on top.
       | 
       | Maybe not in this specific case, but in general Prometheus my
       | preferred TSDB sitting between Telegraf and Grafana
       | 
       | If you need further scale-out there are options for federating
       | Prometheus instances as well.
        
       | dijit wrote:
       | If I have a regret in my observability stack I think it's got to
       | be influxdb.
       | 
       | I bought in to the TICK stack and planned on using an enterprise
       | support contract when going to production, but every interaction
       | with InfluxData the company has felt a bit sleazy. Trying to push
       | very hard to the cloud offering for example.
       | 
       | That's bad enough, but the documentation and observability of the
       | database is quite poor, and it's trivially easy to "vanish" all
       | your data and lock your instance up for hours or days by changing
       | the retention policy of a database. (Not making it much
       | different).
       | 
       | Now of course it's not TICK at all. More like "TI" as kapacitor
       | and chonograph (dashboarding and alerting respectively) are
       | deprecated products and rolled in to the main offering.
       | 
       | Added to that they completely changed the query language.
       | 
       | I have to say; pick something better if you can. TimescaleDB or
       | Prometheus (which uses openTSDB) are promising.
        
         | jrockway wrote:
         | I used 1.x for my push-monitoring stack at my last job. (For
         | cases where "pull" is practical, I would always use Prometheus.
         | Prometheus also has "push" now, by the way.) They went into 2.0
         | mode and kind of neglected 1.x, and I kind of forgot about it.
         | At the time, I was most familiar with an internal monitoring
         | system at Google, and I found I couldn't do queries that I
         | expected to be able to do. I even mentioned it on HN and some
         | influx folks told me that what I wanted to do was too weird to
         | support. (It's not. I was collecting byte counters from fiber
         | CPEs, and wanted to have bandwidth charts based on topology
         | tags I stored with the data -- imagine a SQL table like
         | (serial_number text not null, time timestamp not null, locality
         | text not null, bytes_sent int64 not null, bytes_received int64
         | not null). The problem was that timestamps would not be aligned
         | between records in the same locality group -- I sampled these
         | occasionally throughout the day and not all at the same
         | instant. And, they were counters, not deltas, so the query
         | would have to do the delta across each serial number, and then
         | aggregate across all devices in a locality. Very possible to
         | do, I literally had that chart with the other monitoring
         | system. But not possible with the influx v1 querying, as far as
         | I could tell.)
         | 
         | I set up 2.x for myself recently, and they have really done a
         | lot of work. The OSS offering has most of the features that
         | cloud/enterprise would. It was easy to set up -- they don't
         | have any instructions for installing it in Kubernetes, and
         | haven't updated their Helm charts for 2.x, but it was like 3
         | minutes to write a manifest (https://github.com/jrockway/jrock.
         | us/tree/master/production/...) myself, which I prefer 99.9% of
         | the time anyway. The new query language is incredibly verbose,
         | but I see the steps that I remember having with Google's
         | internal system, align, delta, aggregate... all possible. (I
         | had to scratch my head a lot, though, to make it work. And I
         | really am not able to reason about what operations it's doing,
         | what's indexed or not indexed, why I ingest my data as rows but
         | process it as columns, etc.) The performance is good, and it
         | worked well for my use case of pushing data from my Intranet of
         | Stuff. Generally I like it and I don't think they are being
         | shady in any way. It's on my list of something to set up at
         | work to collect various pieces of time series data outside of
         | the Prometheus ecosystem (CI runtimes, etc.).
         | 
         | The reason I picked InfluxDB over TimescaleDB for my personal
         | stuff is because InfluxDB has an HTTP API with built-in
         | authentication. I already a ton of HTTP services exposed to the
         | Internet, and I understand them well. (Yup, I have SSO and rate
         | limiting and all that stuff for my personal projects ;) I can
         | give each of my devices an API key from their web interface,
         | and I make an HTTP request to write data. Very simple. (They
         | have a client library, but honestly my main target is a
         | Beaglebone, and it doesn't have enough memory to compile their
         | client library. I've never seen "go build" run out of memory,
         | but their client makes that happen. I shouldn't develop on my
         | IoT device, of course, but it's just easier because it has
         | Emacs and gopls, and all the sensors connected to the right
         | bus. Was easier to just manually make the API calls than to
         | cross-compile on my workstation and push the release build to
         | the actual device.) TimescaleDB doesn't have that, because it's
         | just Postgres. So I'd basically have to expose port 5432 to the
         | world, create Postgres users for every device, generate a
         | password, store that somewhere, etc. Then to ingest data, I'd
         | connect to the database, tune my connection pool, retry failed
         | requests manually, etc. Using HTTP gets me all that for free; I
         | can just configure retries in Envoy.
         | 
         | But... SQL queries are a lot easier to figure out than FluxQL
         | queries, and I already have good tools for manipulating raw
         | data in Postgres (DataGrip is my preferred method), so I think
         | I will likely be revisiting TimescaleDB. Honestly, I'd pay for
         | a managed offering right now if they had a button in Google
         | Cloud Console that was "Create Instance and by the way this
         | just gets added to your GCP bill for 10% more than a normal
         | Cloud SQL instance".
        
       | xupybd wrote:
       | I'm intrigued by the need for what looks like server hardware
       | monitoring that is needed in a vehicle?
        
       ___________________________________________________________________
       (page generated 2021-08-26 23:01 UTC)