[HN Gopher] Launch HN: Odigos (YC W23) - Instant distributed tra...
       ___________________________________________________________________
        
       Launch HN: Odigos (YC W23) - Instant distributed tracing for
       Kubernetes clusters
        
       Hi HN! We're Eden and Ari, co-founders of Odigos
       (https://github.com/keyval-dev/odigos). Odigos is an open-source
       project that lets you instantly generate distributed traces for
       your applications. It works alongside existing monitoring tools and
       does not require any code changes.  Our earlier experiences with
       monitoring tools were frustrating. Monitoring a distributed system
       with multiple microservices, we found ourselves spending way too
       much time trying to locate the specific microservice that was at
       the root of a problem. For example, we once spent hours debugging
       an application which we suspected was causing high latency, only to
       find out that the actual problem was rooted in a completely
       different application  Then we learned about distributed tracing,
       which solves exactly this problem. As opposed to metrics or logs
       that capture a data point in time in a single application, a
       distributed trace follows a request as it propagates through a
       distributed environment by tagging it with a unique ID. This allows
       developers to understand the context of each request and how their
       distributed applications work.  The downside is that it is
       difficult to implement. Unlike metrics or logs, the value of
       distributed tracing is gained only after implementing it across
       multiple applications. If even one of your applications does not
       produce distributed tracing, the context propagation is broken and
       the value of the traces drops significantly.  We manually
       implemented distributed tracing for multiple companies, but found
       it a challenge to coordinate all the development teams to
       instrument their applications in order to achieve a complete
       distributed trace. Once the implementation was finished, we saw
       great value and fixed production issues much faster. But partial
       implementation wasn't worth much.  We set out to find a way to
       automate this process. We knew how to do most of it, but the
       trickiest part was how to automatically instrument programs written
       in compiled languages (like Go). If we could do that, we would be
       able to automate the entire process of generating distributed
       traces. While researching, we realized that eBPF--a technology that
       allows the Linux kernel to load external programs for execution
       within the kernel--could be used to develop automatic
       instrumentation for compiled languages. That was the final piece of
       the puzzle, and with it we were able to develop Odigos.  Odigos
       first scans and recognizes all your running applications, then
       recognizes the programming language of each one and auto-
       instruments it accordingly, using eBPF and OpenTelemetry. In
       addition, it deploys collectors that buffer, filter, and deliver
       data to your chosen monitoring tool, and auto scales them according
       to the amount of traffic. This automation allows developers to
       enjoy distributed traces within minutes as opposed to manual effort
       which can take months to implement.  Automatic instrumentation
       across programming languages is not a trivial task, especially when
       dealing with static binaries (like the ones produced by the Go
       compiler). We built multiple mechanisms to make sure we inject the
       relevant headers in a secure and stable way. We developed a system
       that tracks functions and structs across different versions of
       open-source libraries. In addition, we developed a system that
       performs userspace memory management in eBPF. As a result, Odigos
       is the only solution that is able to automatically generate
       distributed traces for compiled languages like Go and Rust. While
       other solutions require users to be experts in OpenTelemetry or
       eBPF, our solution does not require prior knowledge of
       observability technologies.  Our solution can be installed on any
       Kubernetes cluster by executing a single command. Once installed,
       we detect the programming language of every running application and
       apply the relevant instrumentation. For JIT languages (Java and
       .NET) or interpreted languages (JavaScript and Python) we deploy
       the OpenTelemetry instrumentation. For compiled languges (Go, Rust,
       C) we deploy our eBPF-based instrumentation. All of this is
       abstracted from the user, who only has to: (1) select any or all of
       the target applications and (2) select a backend to send the
       monitoring data to.  In May 2022, we released our first open-source
       project: automatic instrumentation for Go applications, based on
       eBPF. We later donated this project to the OpenTelemetry community
       and it is currently being developed as part of the Go Automatic
       Instrumentation SIG.  We are big believers in open standards,
       therefore the instrumentation and collectors used by Odigos are all
       based on open-source projects developed by the OpenTelemetry
       community. This also enables us to be vendor-agnostic.  Currently
       we are focused on building our open-source project. There are no
       pricing or paid features as of yet, but in the future, we are
       planning to offer a managed version of Odigos that will include
       enterprise features.  If you're interested to learn more, check out
       our docs (https://docs.odigos.io), watch a demo video
       (https://www.youtube.com/watch?v=9d36AmVtuGU), and visit our
       website (https://odigos.io).  We'd love to hear your experiences
       with tracing and monitoring distributed applications and anything
       else you'd like to share!
        
       Author : edenfed
       Score  : 99 points
       Date   : 2023-01-19 16:59 UTC (6 hours ago)
        
       | decisionSniper wrote:
       | I'm curious how this will stack up against Sysdig/Falco -
       | https://sysdig.com/blog/sysdig-and-falco-now-powered-by-ebpf....
       | 
       | eBPF for the win, this is a nice approach with Odigos.
        
         | debarshri wrote:
         | I don't think it is comparable to falco. Talk is more about
         | security violations of the container. It is not related to
         | distributed tracing.
        
         | edenfed wrote:
         | Falco is really cool project but it focuses more on security.
         | Odigos is focused on getting better monitoring signals from
         | your applications, especially distributed tracing
        
       | nate908 wrote:
       | Interesting, it looks like you've put some hard work into this
       | project. My question is, what if a pod has multiple containers in
       | it? How does Odigos choose which icon/programming language that
       | is displayed for the pod? For example, I have a Deployment that
       | runs pods with two containers: a php-fpm container and a nginx
       | container. Would the "Choose Target Applications" page show an
       | icon for both Nginx and PHP for the given Deployment? Would
       | Odigos report separate metrics to the backend Desination for both
       | PHP and Nginx?
        
         | edenfed wrote:
         | Odigos will be able to instrument both containers each with the
         | relevant instrumentation. As you pointed out, there is
         | currently a bug in the UI that shows just one programming
         | language per pod. Working on fixing it soon
        
       | stavros wrote:
       | I am very amused by your choice of name, as Odigos is to land
       | what Kubernetes is to sea.
        
       | jzelinskie wrote:
       | Congrats on the launch! OpenTelemetry/Distributed Tracing has
       | been in dire need of quality of life improvements, so I'm glad to
       | see more folks filling in the gaps.
       | 
       | I see you're injecting trace IDs into programs. How do you
       | guarantee that this doesn't break the binary or flag any
       | security/compliance requirements?
        
         | edenfed wrote:
         | This is something we are thinking about a lot. We developed
         | multiple mechanisms to make sure we inject the IDs in a safe
         | way. You can see the code here: https://github.com/keyval-
         | dev/opentelemetry-go-instrumentati...
        
         | phillipcarter wrote:
         | > dire need of quality of life improvements
         | 
         | Agreed! I'm one of the maintainers of part of the project -
         | what sorts of things are top of mind for you w.r.t. quality of
         | life improvements?
        
       | william-evans wrote:
       | This is really cool - given my perception of the target market it
       | might be worth targeting AWS Elastic Container Service (ECS) next
       | as the userbase there, I would imagine, is generally looking for
       | less-complex solutions (given the complexity difference between
       | Kubernetes and ECS).
        
         | edenfed wrote:
         | ECS is definitely on our roadmap!
        
       | thorgaardian wrote:
       | Looks awesome! I hadn't had the chance to dive into eBPF yet, but
       | I had hoped someone would be able to use it in a clever way like
       | this!
       | 
       | I was digging through the docs and it looks like you have custom
       | language detection. Did you consider trying to extract the
       | language detection features from buildpack to do this? I imagine
       | you'd get more reliable results and less to maintain if you used
       | that as the basis.
        
         | edenfed wrote:
         | Yes we are actually using a combination of env vars / process
         | names / linked libraries and container metadata to detect the
         | language
        
       | cube2222 wrote:
       | Wow, if this really works like you describe, then this is magic!
       | 
       | > Automatic instrumentation across programming languages is not a
       | trivial task, especially when dealing with static binaries (like
       | the ones produced by the Go compiler). We built multiple
       | mechanisms to make sure we inject the relevant headers in a
       | secure and stable way. We developed a system that tracks
       | functions and structs across different versions of open-source
       | libraries.
       | 
       | Could be very useful for non-greenfield projects. I'd love to
       | learn more about the details, is there any writeup somewhere?
       | 
       | Though I'd still recommend new projects do "proper" tracing with
       | not only one-per-service spans, but also spans for important
       | functions, including additional application-specific tags, as
       | that is easily 10x the value.
       | 
       | But since life is a sequence of tradeoffs, I think this project
       | could be really useful in a lot of places.
        
         | phillipcarter wrote:
         | > Though I'd still recommend new projects do "proper" tracing
         | with not only one-per-service spans, but also spans for
         | important functions, including additional application-specific
         | tags, as that is easily 10x the value.
         | 
         | FWIW Odigos makes this possible because it uses OpenTelemetry
         | (and generates OTel-compatible instrumentation for the eBPF-
         | sourced data). You can go into an app that's instrumented this
         | way, add an OpenTelemetry SDK, and start writing manual
         | instrumentation or include additional instrumentation
         | libraries. Your traces will just get deeper/richer when you do
         | that.
        
         | edenfed wrote:
         | We are actually doing technical deep dive on the next meeting
         | of the OpenTelemetry Go auto instrumentation meeting in
         | Tuesday. Will be happy to share the presentation afterwards.
         | 
         | In addition, we automatically create spans for popular open
         | source libraries in use so you should also expect to see spans
         | for database connections / cloud SDKs/ Kafka clients / etc.
         | Definitely agree that manual instrumentation is very important
         | in addition to the automatic one
        
       | theptip wrote:
       | Looks cool! Great to see entrants into this space.
       | 
       | How does this compare with Cilium? Looks like they do OT tracing
       | (https://github.com/cilium/hubble-otel) but it's not native/core,
       | is that the main distinction?
        
         | edenfed wrote:
         | As far as I know cilium does not do automatic context
         | propagation and require code changes to achieve it. Odigos
         | automatically do context propagation
        
       | hinkley wrote:
       | Distributed tracing really ought to be built into every web
       | application framework. What's the value in signing over your
       | autonomy to a framework if it isn't going to handle cross-cutting
       | concerns like forwarding correlation IDs from the inbound request
       | to all outbound requests triggered by that request?
        
         | edenfed wrote:
         | Unfortunately not all web frameworks do this automatically. In
         | addition sometimes you may want to propagate ID over non http
         | connections like database drivers or even message queues.
        
       | Ancient wrote:
       | Just saw the demo video, looks awesome. Is this tool from the
       | future or some dark wizard tricks? Keep up the great work.
        
       | Benjamin_Dobell wrote:
       | This is really cool. Upon further Googling, readers may be
       | interested in https://kubernetes.io/blog/2017/12/using-ebpf-in-
       | kubernetes/
       | 
       | If you can go beyond Kubernetes, I think that'd give Odigos more
       | staying power. Naturally some integrations are out of your hands,
       | AWS Fargate being one (https://github.com/aws/containers-
       | roadmap/issues/1027). However, if you _could_ get integrations up
       | and running with the likes of Fargate, Fly.io, Render.com etc.
       | That 'd be _amazing_.
        
         | edenfed wrote:
         | Support for non-Kubernetes environments is something we are
         | planning to release very soon.
        
       | jedberg wrote:
       | This is awesome! Request tracing is basically the fundamental
       | building block to observability in a distributed system.
       | 
       | Doing it automatically is a huge win!
       | 
       | Congrats on the launch and I look forward to learning more!
        
       ___________________________________________________________________
       (page generated 2023-01-19 23:00 UTC)