[HN Gopher] Launch HN: ContainIQ (YC S21) - Kubernetes Native Mo...
___________________________________________________________________
Launch HN: ContainIQ (YC S21) - Kubernetes Native Monitoring with
eBPF
Hi HN, I'm Nate, and here together with my co-founder Matt, we are
the founders of ContainIQ (https://www.containiq.com/). ContainIQ
is a complete K8s monitoring solution that is easy to set up and
maintain and provides a comprehensive view of cluster health. Over
the last few years, we noticed a shift that more of our friends and
other founders were using Kubernetes earlier on. (Whether or not
they actually need it so early is not as clear, but that's a point
for another discussion.) From our past experience using open-source
tooling and other platforms on the market, we knew that the
existing tooling out there wasn't built for this generation of
companies building with Kubernetes. Many early to middle-market
tech companies don't have the resources to manage and maintain a
bunch of disparate monitoring tools, and most engineering teams
don't know how to use them. But when scaling, engineering teams do
know that they need to monitor cluster health and core metrics, or
else end users will suffer. Measuring HTTP response latency by URL
path, in particular, is important for many companies, but can be
time-consuming to install application packages for each individual
microservice. We decided to build a solution that was easy to set
up and maintain. Our goal was to get users 95% of the way there
almost instantly. Today, our Kubernetes monitoring platform has
four core features: (1) metrics: CPU and memory for pods/nodes,
view limits, capacity, and correlate to events, alert on changes;
(2) events: K8s events dashboard, correlate to logs, alerts; (3)
latency: monitor RPS, p95, and p99 latencies by microservices,
including by URL path, alerts; and (4) logs: container level log
storage and search. Our latency feature set was built using a
technology called eBPF. BPF, or the Berkeley Packet Filter, was
developed from a need to filter network packets in order to
minimize unnecessary packet copies from the kernel space to the
user space. Since version 3.18, the Linux kernel provides extended
BPF, or eBPF, which uses 64-bit registers and increases the number
of registers from two to ten. We install the necessary kernel
headers for users automatically. With eBPF, we are monitoring from
the kernel and OS level, and not at the application level. Our
users can measure and monitor HTTP response latency across all of
their microservices and URL paths, as long as their kernel version
is supported. We are able to deliver this experience immediately by
parsing the network packet from the socket directly. We then
correlate the socket and sk_buff information to your Kubernetes
pods to provide metrics like requests per second, p95, and p99
latency at the path and microservice level, without you having to
instrument each microservice at the application level. For example
with ContainIQ, you can track how long your node.js application is
taking to respond to HTTP requests from your users, ultimately
allowing you to see which parts of your web application are slowest
and alerting you when users are experiencing slowdowns. Users can
correlate events to logs and metrics in one view. We knew how
annoying it was to toggle between multiple tabs and then scroll
endlessly through logs trying to match up timestamps. We fixed
this. For example, a user can click from an event (ex a pod dying)
to the logs at that point in time. Users can set alerts across
really all data points (ex. p95 latency, a K8s job failing, a pod
eviction). Installation is straightforward either using helm or
with our YAML files. Pricing is $20 per node / month + $1 per GB
of log data ingested. You can sign up on our website directly with
the self-service flow. You can also book a demo if you would like
to talk to us, but that isn't required. Here are some videos
(https://www.containiq.com/kubernetes-monitoring) if you are
curious to see our UX before signing up. We know that we have a
lot of work left to do. And we welcome your suggestions, comments,
and feedback. Thank you!
Author : NWMatherson
Score : 66 points
Date : 2022-01-06 16:30 UTC (6 hours ago)
| nyellin wrote:
| Nice to see a new eBPF based solution out there. Good luck.
| NWMatherson wrote:
| Thanks so much!
| rlyshw wrote:
| I recently had an issue where my UDP service worked fine exposed
| directly as a NodePort type, but not through an nginx UDP
| ingress. I _think_ the issue was that the ingress controller
| forwarding operation was just too slow for the service's needs,
| but I had no way of really knowing.
|
| Now if I had this kernel level network monitoring system, I
| probably could have had a clearer picture as to what is going on.
|
| Really one of the hardest problems I've had with
| learning/deploying in k8s is trying to trace down the multiple
| levels of networking, from external TLS termination to
| LoadBalancers, through ingress controllers, all the way down to
| application-level networking, I've found more often than not the
| easiest path is to just get rid of those layers of complexity
| completely.
|
| In the end I just exposed my server on NodePort, forwarded my NAT
| to it, and called it done. But it sounds like something like
| ContainIQ can really add to a k8s admin's toolset for
| troubleshooting these complex network issues. I also agree with
| other comments here that a limited, personal-use/community tier
| would be great for wider adoption and home-lab users like me :)
| NWMatherson wrote:
| Appreciate this insight and I agree with you.
|
| And I can definitely circle back here when our limited use tier
| goes live. Agree on that too.
| gigatexal wrote:
| A community edition/non-paid would be quite nice to be able to
| trial this out before paying.
|
| This is how an old employer adopted CockroachDB because we
| trialed the non-enterprise version and then ultimatley bought a
| license.
| permalac wrote:
| Agreed.
|
| Our employer does not invest on this kind of tools, so when a
| free version does not exist for us the tool does not exist.
|
| We would be happy to provide usage metrics and reports, our
| company is full open source and open data, and we work/invest
| time on open projects when possible.
| nyellin wrote:
| Not the OP, but I develop a different open source tool for
| Kubernetes and would love to talk! (Email is in my profile)
| HatchedLake721 wrote:
| Does your company use paid versions of the open source tools
| or pay support?
| NWMatherson wrote:
| We are planning to open source our agents in 2022!
| NWMatherson wrote:
| I agree. We are planning to launch a free edition with limited
| size and data retention. For users to try / play with before
| paying. It is in the works and we hope to have this out in the
| next few months.
|
| We are also thinking about launching trials too.
| nodesocket wrote:
| Hello. I own and run a DevOps consulting company and use DataDog
| exclusively for clients. DD works pretty well as it integrates
| with cloud providers (such as AWS), physical servers (agent), and
| Kubernetes (helm chart). The pain point is still creating all the
| custom dashboards, alerts, and DataDog integrations and
| configuration. Managing the DataDog account can almost be a full-
| time job for somebody. Especially with clients who have lots of
| independent k8s clusters all in a single DD account (lots of
| filtering on tags and labels).
|
| What does ContainIQ offer in terms of benefits over well
| established players like DataDog? I will say, the Traefik DataDog
| integration is horrible and hasn't been updated in years so
| that's something I wish was better. DataDog does support
| Kubernetes events (into the feed), and their logging offering is
| quite good (though very expensive).
| NWMatherson wrote:
| The dashboard configuration issue was actually one of the pain
| points we targeted initially. It was an issue we experienced
| too. And we talked to a lot of our friends who had spent
| significant time setting these dashboards up in Datadog. One of
| our initial goals has been to try to automate to get you 95% of
| the way there without any configuration on your end. We've also
| tried to make alerting really easy and are working to automate
| the process of setting smart alerts. Would love to chat more
| about your experience if you are open to it. My email is nate
| (at) containiq (dot) com
| MoSattler wrote:
| How does this compare to Opstrace? [0]
|
| [0]: https://opstrace.com
| NWMatherson wrote:
| OpsTrace took an interesting approach (and was a YC company
| too, recently acquired by GitLab). We are a managed solution,
| whereas OpsTrace was a self hosted open source solution. And we
| are not building on top of other open source tools. With
| ContainIQ, you can get metrics natively and other features that
| you wouldn't otherwise be able to get (ex p95 latency by
| endpoint) with OpsTrace and its integrations.
| kolanos wrote:
| How does this compare to Pixie? [0]
|
| [0]: https://github.com/pixie-io/pixie
| outgame wrote:
| Polar signals develops Parca [0] which is another eBPF
| observability tool, and Isovalent develops Cilium [1] which is
| built on eBPF as well. Genuinely curious if there are
| differences, or if eBPF only allows for specific observability
| functionality and each tool has it all.
|
| [0]: https://github.com/parca-dev/parca
|
| [1]: https://github.com/cilium/cilium
| brancz wrote:
| Polar Signals founder and one of the creators of Parca here.
| From what I can tell ContainIQ is distinct from Parca and
| Polar Signals as we only concern ourselves with continuous
| profiling, which is complementary to metrics, logs and
| traces. From our experience, while eBPF is certainly limited
| and it can be painful to work with the verifier at times, it
| hits a sweet spot for observability collection because of low
| overhead and you really only read some structs from memory
| somewhere for which the limitations of eBPF tend to be
| plentiful.
|
| Definitely excited to see more eBPF tooling appear in the
| observability space.
| NWMatherson wrote:
| Well said, we are excited to see more eBPF tooling appear
| as well.
| NWMatherson wrote:
| Pixie is definitely similar in their eBPF based approach. I
| believe there are differences in the types of data they collect
| and correlate with. For example we collect logs and state
| information (node status, node conditions, pod scheduled ect)
| along side our eBPF based metrics like latency. I'm sure there
| are things they collect that we don't as well.
___________________________________________________________________
(page generated 2022-01-06 23:01 UTC)