[HN Gopher] Using eBPF and predefined inspections to minimize "o...
___________________________________________________________________
Using eBPF and predefined inspections to minimize "observability
tax"
Author : apetruhin
Score : 79 points
Date : 2022-12-27 16:02 UTC (6 hours ago)
(HTM) web link (coroot.com)
(TXT) w3m dump (coroot.com)
| PeterZaitsev wrote:
| eBPF is great! Fantastic to see it used more for observability!
| VectorLock wrote:
| Its too bad they had to build a whole separate tool because
| Grafana wasn't capable. The Node graph panel for Grafana really
| needs some love.
|
| Edit: Also they need a SaaS to sell, so of course.
| nikolay_sivko wrote:
| Grafana offers one of the best solutions for storing and
| navigating through telemetry data, but there remains a
| challenge in using this data to generate insights. Our goal is
| to address this issue, even if it means occasionally
| reinventing the wheel. This is a necessary step at this stage.
| buro9 wrote:
| An update to the node graph is currently in the works, some
| love is being shown (I work at Grafana Labs).
|
| It's also too bad that they built their own eBPF
| instrumentation as the Cloudflare eBPF exporter also exists and
| is very good https://github.com/cloudflare/ebpf_exporter
|
| Alternatively if what you want specifically are the
| integrations mentioned on the coroot page then my money would
| be on isovalent and cilium
| nikolay_sivko wrote:
| With ebpf-exporter it is not possible to implement complex
| logic, such as converting the PID of each TCP connection into
| a container name and the destination IP into a real IP
| according to the conntrack table.
| alexeldeib wrote:
| This seems like a limitation that could be lifted instead
| of introducing a separate product (disclaimer: familiar
| with ebpf exporter but haven't dug into OP).
|
| Iirc ebpf exporter had some limitations, but they weren't
| fundamental. However it was also fairly light, so maybe
| another tool is just the right solve.
| nikolay_sivko wrote:
| Coroot's agent collects data from various sources to
| cover all aspects of container behavior. Ebpf-exporter
| perfectly solves the problem of running custom ebpf
| programs and turning their output into metrics, but using
| it as a foundation for more specific solutions doesn't
| seem reasonable
| prpl wrote:
| Are there more data sources or ways to overlay node graph
| information over time series data?
|
| I've wanted to use it, but haven't had time to write a custom
| day source.
| tptacek wrote:
| There's not much to the eBPF profiler they've built, and not
| very much overlap with ebpf_exporter; ebpf_exporter also
| seems to require CO-RE kernels.
| ekiauhce wrote:
| Thanks for the great article!
|
| At my current employer we have a company-wide service for
| aggregating error logs in particular (WARN, ERORR level log rows
| and stacktraces, if it was an exception) so developers can
| analyze them for debugging purposes. Also it automatically
| gathers information about incoming http request (geo, ip address,
| user agent, etc) and you can easily see a particular segment of
| errors, and what kind of users getting them.
|
| As I can see you have logs quantitative metric https://community-
| demo.coroot.com/p/oc1vhnmq/app/default:Dep... but without any
| detalization (maybe it works this way only for the demo app). I
| mean, it would be great to be able to inspect each ERROR event
| separately or to define custom SLO with alert for particular type
| of errors.
|
| Another great feature we use a lot is historical data, so you can
| find patterns of error spikes on months scale and when it has
| gone after fix.
|
| FYI this error-service I'm talking about is built on top of the
| ClickHouse, so it's quite responsive regardless of the large
| volumes of data.
|
| Another thing I want to mention is cron-like workload (or batch
| jobs, you name it). Is there any support or useful metrics for
| it?
| john-tells-all wrote:
| this looks really useful! For my business I want 1) high-res data
| about local CPU-memory-IO (e.g. - how can I speed up tests), and
| 2) summary sampling data from production, to detect weird bugs or
| attacks. eBPF might be able to solve both cases!
___________________________________________________________________
(page generated 2022-12-27 23:00 UTC)