https://blog.thecloudside.com/raspberry-pi-monitoring-using-telegraf-influxdb-and-grafana-defb63127fe3

 
The Cloudside View

  * Cloud
  * Data
  * Kubernetes
  * App
  * ML & AI
  * Cloudside

Raspberry Pi Monitoring using Telegraf, InfluxDB, and Grafana

RK Kuppala
RK Kuppala
Follow
Aug 26 * 7 min read

We recently had to build a reliable monitoring dashboard for a bunch
of Raspberry Pi devices for one of our customers. These devices are
mounted on vehicles on the move and they are expected to have
intermittent connectivity issues. One of the requirements was to be
able to collect the metrics even when the devices are not connected
to the network and deliver them to the monitoring system when they
come online.

Why not Prometheus?

Prometheus normally is the first thing that comes to mind when it
comes to monitoring. Prometheus's node_exporter agent can run on a
Raspberry Pi device and expose metrics on http://
public.ip.of.device:9100. But the problem with this approach is that
Prometheus is a pull-based monitoring system. This will not work for
our current use case because of the intermittent connectivity issues.
When a device is not accessible for a given period, Prometheus's
scraper will miss ingesting the node_exporter metrics. We wanted our
monitoring agent to keep collecting metrics even when the remote
monitoring host is not available, buffer the metrics locally and
deliver as soon as it comes online. While Prometheus has the push
gateway, but it's designed for short-lived, service-level batch jobs
and not for full-fledged system monitoring. We tried colocating the
node_exporter with a local push gateway hoping that we could achieve
buffering of metrics, but push gateway basically rewrites a local
file metrics.store with the latest metrics all the time and does not
really buffer them. Prometheus is not the main topic of this post,
but there are other places where the pull approach shines and it is
explained very well here.

This brings us to the main topic of this post -- the TIG stack. In T
elegraf, InfluxDb, and Grafana combination, we were able to find a
solution that pushes metrics from the devices using Telegraf, store
them in InfluxDB and visualize using Grafana. Using
metrics_buffer_limit configuration in Telegraf, we were able to
buffer the metrics locally to survive small but intermittent
connectivity issues before they are shipped to a remote InfluxDb
database. I thought documenting all setup related steps in a post
would be useful to posterity, so the rest of the post is a step by
step guide :)

Install and Configure InfluxDB 2.0

I assume that you already have a VM running on AWS or GCP, so I won't
bore you with the steps of creating VMs. Let's go ahead with the
installation. Find the right binaries from this page. I am using
CentOS for this guide.

wget https://dl.influxdata.com/influxdb/releases/influxdb2-2.0.8.x86_64.rpmsudo yum localinstall influxdb2-2.0.8.x86_64.rpm

Enable the service and start it

systemctl enable influxdb.service && systemctl start influxdb.service

You should now be able to configure it for the first time by visiting
the public IP of your VM on port 8086

[1]
[1]

You can set up the initial user, password, organization (this is a
workspace that let's you control access to a group of users), and a
bucket (a bucket is similar to a database with a retention policy).
Once connected, you will be able to view your organization, buckets,
and tokens. A token is how you allow Telegraf to write data or
Grafana to read from InfluxDB. We will come back to tokens in a bit.

[1]
[1]

Let's Install and Configure Grafana

You can either install Grafana on the same host that runs InfluxDB or
choose a separate VM. For this post, we are using the same VM. Let's
install Grafana.

Create a file sudo vi /etc/yum.repos.d/grafana.repo and add the
following

[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

Install sudo yum install grafana

Enable and start the service

sudo systemctl daemon-reload && sudo systemctl start grafana-server && sudo systemctl status grafana-server

You will now be able to access the UI at http://public.ip.of.vm:3000.
Go ahead and use admin/admin for the first login, setup a new
password.

Once you are in, go to add data source and choose influxDB

[1]
[1]

Under URL, I am going to choose http://localhost:8086 pointing to
InfluxDB running locally. If your Influx host is a different VM,
choose the url accordingly. You can either choose InfluxQL or Flux
for Query Language, but note that InfluxQL with InfluxDB 2.0 has
read-only support. For my Raspberry Pi dashboard, I chose Flux.

[1]
[1]

Now to the authentication part. Toggle the "Basic Auth" off. We will
now go to InfluxDB UI in a different browser tab, and create an
authentication token that Grafana will use to connect to InfluxDB.
Open http://your.vm.ip.address:8086 to access InfluxDb and navigate
to tokens

[1]
[1]

Generate a token with read/write or both access and specify the
bucket(s)

[1]
[1]

Get the token and comeback to Grafana configuration

[1]
[1]

Use the token here, save and test

[1]
[1]

We are now ready to ingest the data to InfluxDB and visualize it in
Grafana. It's Raspberry time now.

Configure Telegraf on Raspberry Pi and Collect Metrics

SSH to the raspberry pi device and install the Telegraf agent. Before
that, you might want to change the default device password and
hostname using raspi-config

Install necessary tools

sudo apt-get update && sudo apt-get install apt-transport-https

Our Pi 4 devices run buster, so let's add the influx key

wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -- source /etc/os-releasetest $VERSION_ID = "10" && echo "deb https://repos.influxdata.com/debian buster stable" | sudo tee /etc/apt/sources.list.d/influxdb.list

Install and enable the Telegraf service

sudo apt-get update && sudo apt-get install telegraf && systemctl enable telegraf.service

You can start the service once and check the status. It will show
errors related to InfluxDB not being reachable because of the default
telegraf.conf file, which talks to a local InfluxDB. Ignore this
error and stop the service.

systemctl start telegraf.service
systemctl status telegraf.service
# stop service after this

We have to now configure the Telegraf agent to write to the InfluxDB
output and also collect metrics using Inputs. Out of the box,
Telegraf supports a wide variety of inputs and outputs. Take a look
at all the supported plugins here. Apart from the plugins listed,
Telegraf also allows us to execute custom scripts to capture
additional metrics. We will use the custom script input to capture
some raspberry pi specific metrics like SOC temperature, SD RAM
voltages, throttle states etc. There is prior art available already --
here and here. Below is a combination of the two scripts above, by
Oostens.

Here is the custom script, create a file with this content in your
raspberry pi device

sudo vi /var/lib/telegraf/vcgencmd.sh

Make the script executable and also check whether telegraf user can
execute this script. Also, we need telegraf user to be able to get
information about GPU, so add telegraf to video group

sudo usermod -G video telegrafchmod +x /var/lib/telegraf/vcgencmd.sh

sudo -u telegraf /var/lib/telegraf/vcgencmd.sh

Let's now move on to telegraf.conf file. This gist has the
configuration file I have used. Pay attention to the following:

Output is configured to write to InfluxDB. The URL is the public IP
of the InfluxDB server, port 8086. You will also recollect that we
created an organization, bucket and a token in above sections.

[[outputs.influxdb_v2]]urls = ["http://x.x.x.x:8086"]
token = "XXXXXX-YOUR-INFLUXDB-AUTHENTICATION-TOKEN-XXXXXX"
organization = "cloudside-academy"
bucket = "cloudside-academy-dev"

After this, there are standard input plugins like [[inputs.cpu]] ,
[[inputs.system]] etc used to collect the system metrics. Finally we
have inputs gathered via custom script we created above

# Read RPi CPU temperature
[[inputs.exec]]
commands = [ '''sed -e 's/^\([0-9]\{2\}\)\(.*\)$/\1.\2/' /sys/class/thermal/thermal_zone0/temp''' ]
name_override = "sys"
data_format = "grok"
grok_patterns = ["%{NUMBER:thermal_zone0:float}"] # Vcgencmd input
[[inputs.exec]]
commands = ["/var/lib/telegraf/vcgencmd.sh"]  timeout = "7s"  data_format = "influx"

Go ahead and create the new telegraf.config file with this gist (and
your changes as needed)

sudo mv /etc/telegraf/telegraf.conf /etc/telegraf/telegraf.conf.bak
sudo vi /etc/telegraf/telegraf.conf

Start the service

systemctl start telegraf.service

You should now see telegraf agent sending data to InfluxDB and ready
to be visualized.

[1]
[1]

Finally, it's dashboard time. You can either build your own custom
dashboard, or import an existing Grafana dashboard like this.

I ended up with a custom dashboard like this for our customer.

[1]
[1]

Hope you found this useful! Happy monitoring! :)

The Cloudside View

We shall not cease from exploration...!

 

  * Cloud
  * Raspberry Pi
  * Monitoring
  * Influxdb

 
 
The Cloudside View

The Cloudside View

We are Cloudside, a trusted team of cloud-native and data problem
solvers, helping our clients tackle complex scale, availability,
performance, and data analytics problems on GCP, AWS. This blog is a
witness to our team's adventures and learnings in Cloud, Data & App
Engineering

RK Kuppala

Written by

RK Kuppala

Follow

Co-Founder & CTO, thecloudside.com

The Cloudside View

The Cloudside View

We are Cloudside, a trusted team of cloud-native and data problem
solvers, helping our clients tackle complex scale, availability,
performance, and data analytics problems on GCP, AWS. This blog is a
witness to our team's adventures and learnings in Cloud, Data & App
Engineering