[HN Gopher] The current state of OpenTelemetry
       ___________________________________________________________________
        
       The current state of OpenTelemetry
        
       Author : pranay01
       Score  : 79 points
       Date   : 2024-01-12 17:36 UTC (5 hours ago)
        
 (HTM) web link (signoz.io)
 (TXT) w3m dump (signoz.io)
        
       | baby_souffle wrote:
       | Depends on who you ask.
       | 
       | I am glad that the observability sector has standardized on a
       | common protocol but my god are the reference implementations
       | lacking.
        
         | politelemon wrote:
         | > my god are the reference implementations lacking.
         | 
         | Can you share some of your experience, what do you mean by
         | that? Are there edge cases causing problems, or major missing
         | features? Easy or difficult to use?
        
           | abeppu wrote:
           | As an example, Exemplars are part of the metrics spec [1].
           | The official python library says metrics status is 'stable'
           | [2]. But there's an approximately 2-year old issue with no
           | work on it, titled 'Metrics: Add support for exemplars',
           | where the latest update is that no work has begun [3].
           | Nothing at a top-level of the opentelemetry-python project
           | indicates that the project does not implement everything in
           | the metrics spec, so if you wanted to use that capability,
           | you are apt to discover it relatively late.
           | 
           | [1] https://opentelemetry.io/docs/specs/otel/metrics/data-
           | model/...
           | 
           | [2] https://github.com/open-telemetry/opentelemetry-python
           | 
           | [3] https://github.com/open-telemetry/opentelemetry-
           | python/issue...
        
           | arwineap wrote:
           | otel logging is completely missing from golang for example
        
             | bbkane wrote:
             | I agree this should be there, but I also think in most
             | cases, logs can be completely replaced by otel tracing -
             | see https://www.infoq.com/presentations/event-tracing-
             | monitoring...
        
         | pranay01 wrote:
         | Agree, there being an open standard for instrumentation is a
         | big win. Lots of work still needs to be done on showing more
         | examples and making it more accessible to users & implementors.
         | 
         | One other key area is resources which can help get
         | engineers/implementors to get organizational buy-in
        
         | neonsunset wrote:
         | C# has pretty nice integration with OTEL out of box (ASP.NET
         | Core and otherwise, distributed as separate packages)
         | 
         | https://learn.microsoft.com/en-us/dotnet/core/diagnostics/ob...
        
       | phillipcarter wrote:
       | As a maintainer and end-user, my answer to this is...yes and no.
       | It's important to clarify that, stability - something mentioned
       | in the article - has several major definitions:
       | 
       | - Stability in the specification
       | 
       | - Stability in semantic conventions
       | 
       | - Stability in the protocol representation
       | 
       | - Stability in SDKs that can generate data
       | 
       | - Stability in the Collector that can receive, process, and
       | export that data
       | 
       | Unfortunately, for many people, they may interpret "stable" in
       | one of those categories as "stable for everything", and then get
       | really annoyed when they find their language doesn't actually
       | have stable support (or any support!) for that concept.
       | 
       | What I'm most proud of in 2023 is all of the little things we
       | made progress on with components that engineers have to
       | materially deal with. On the website, we documented what feels
       | like a million little things and clarified tons of concepts that
       | people told us were confusing. Across all the SDKs, we fixed tons
       | of little bugs, added more and more instrumentations, and
       | completed the unsexy work to make metrics generation stable
       | across most of our 11+ languages. The Collector added oodles and
       | oodles of support for different data sources, and OTTL went from
       | a neat component to a rock-solid general-purpose data
       | transformation tool.
       | 
       | There's so much more work to do, but I'm really happy about the
       | progress.
        
       | shipit1999 wrote:
       | OpenTelemetry is a great concept, but in my experience not quite
       | there yet. Docs especially fall into the common trap of handling
       | the happy path hello world quickstarts, then become increasingly
       | useless as you want to get beyond that to real life use cases.
       | Given the inherent tradeoff of complexity that comes from trying
       | to unify different approaches around one standard, sometimes it
       | seems like things that should be simple are more difficult than
       | they should be. I'm sure it will keep improving.
        
         | mason55 wrote:
         | > _Docs especially fall into the common trap of handling the
         | happy path hello world quickstarts, then become increasingly
         | useless as you want to get beyond that to real life use cases._
         | 
         | Yeah, Java is what I'm most familiar with. The "Getting
         | Started" shows how to do some basic manual instrumentation and
         | collect the output with curl. Then the "Next Steps" are just
         | random things with no guidance about why I would or wouldn't
         | choose any of them for my next step.
         | 
         | But, ok, I choose "Automatic Instrumentation", that sounds
         | promising. And it actually is really easy to set up auto
         | instrumentation. But then at the end it says
         | 
         | > _After you have automatic instrumentation configured for your
         | app or service, you might want to annotate selected methods or
         | add manual instrumentation to collect custom telemetry data._
         | 
         | Uh... no... after I have automatic instrumentation enabled I
         | _want to do something with the output_
         | 
         | The two major flaws in the docs seem to be
         | 
         | 1. The common failure of docs to explain to users why they
         | might choose one thing or another. "If you want to do x.. If
         | you want to do y.." what if I don't know?
         | 
         | 2. Because otel is agnostic to the consumer of the output,
         | there's very little in the way of explaining how to get value
         | out of what otel produces. To connect the dots, you really need
         | to use the docs of your observability tool. Which I understand,
         | but then most of them have their own setup directions because
         | they want some extra fields included in the data, or they have
         | their own fork, so not everything in the otel docs is actually
         | usable.
         | 
         | I'm not sure what the answer is. It's not like I expect otel to
         | document how to build a dashboard in Grafana. And a lot of
         | frustration I've experienced has been with the observability
         | tools themselves. But at the same time, I always feel like the
         | otel docs just don't get you anywhere close to getting value
         | out of the library. Which is a shame, because turning on auto-
         | instrumentation and seeing all your traces with literally no
         | extra work is a magical moment.
        
           | Volundr wrote:
           | Hmmm... Yeah I setup open telemetry for a couple personal
           | projects this year was pleased with the ease of setup, but by
           | and large I knew what I was doing specifically I had my
           | application, and I had Grafana and I wanted to get traces
           | from A to B.
           | 
           | Relooking at the docs from the eyes of a newcomers if you
           | don't already have a destination in mind they don't really
           | help you. It's a little tricky because my setup with Grafana
           | will be somewhat different (but similar) from someone using
           | honeycomb or signoz or what have you, but even just having a
           | "want to visualize your data? Check out the list of
           | compatible vendors", with a link that direction would
           | probably go a long way.
        
             | structural wrote:
             | By comparison, I wanted to use opentelemetry for a series
             | of projects, but could find absolutely no useful
             | documentation on how to do anything else other than "send
             | data from a webapp to a server / other cloud service that
             | some vendor wants to sell you".
             | 
             | All I wanted to do was instrument an application and write
             | its telemetry data to a file in a standard way, and have
             | some story regarding combining metrics, traces, and logs as
             | necessary. Ideally this would use minimal system resources
             | when idle. That's it.
        
               | bbkane wrote:
               | It doesn't read from files unfortunately, but
               | https://openobserve.ai/ is very easy to set up locally
               | (single binary) and send otel logs/metrics/traces to.
               | 
               | Here's how I run it locally for my little shovel project
               | - https://github.com/bbkane/shovel#run-the-webapp-
               | locally-with... .
               | 
               | Also linked from that README is an Ansible playbook to
               | start OpenObserve as a systems service on a Linux VM.
               | 
               | Alternatively, see the shovel codebase I linked above for
               | a "stdout" TracerProvider. You could do something like
               | that to save to a file, and then use a tool to prettify
               | the JSON. I have a small script to format json logs at ht
               | tps://github.com/bbkane/dotfiles/blob/2df9af5a9bbb40f2e10
               | 1...
        
           | starkparker wrote:
           | > 1. The common failure of docs to explain to users why they
           | might choose one thing or another. "If you want to do x.. If
           | you want to do y.." what if I don't know?
           | 
           | Observability docs in general struggle with this. So many
           | data sources can emit so many types of metrics in so many
           | formats, and every tool makes this impossible promise of
           | consolidating it all into one space seamlessly. But tools
           | like Grafana pride themselves so much on visualizing
           | _anything_ that they paint themselves into a corner where
           | they can't be prescriptive about common uses or methods
           | without excluding or confusing others.
           | 
           | So a lot of the prescriptive answers to "what if I don't
           | know?" gets chucked onto account and support teams of
           | commercial vendors, because the docs can't anticipate every
           | possible context in which an observability tool will get
           | deployed. Each solution ends up being custom tailored and
           | poorly portable to anyone else's, often not even to other
           | customers with the same data sources and goals at the same
           | scale due to wacky labelling differences or legacy
           | requirements or some internal stakeholder demand.
           | 
           | More narrowly focused tools don't have as many of these
           | problems, but not many organizations want narrowly focused
           | observability tools. (Lots of _people_ do, but orgs don't
           | want to pay out deals to multiple vendors for what looks like
           | different flavors of the same result. And hey look it's
           | Grafana Cloud or Datadog or whatever, it can do _anything_,
           | so you devs and also bizops and SRE and IT and hey sales
           | wants a dashboard too and so does the company cafeteria, why
           | not, you all can just use this one tool and we just deal with
           | one bill with a volume discount, right? Right??)
           | 
           | Smarter tools don't have as many of these problems by
           | papering over the docs limitations by being better able to
           | anticipate or surface connections between data sources,
           | metrics, logs, traces, events, etc., and does so with better
           | interfaces. But especially for high-cardinality data the
           | usability of those tools either seems to fall apart or their
           | companies charge Datadog-sized invoices.
        
       | reindeerer wrote:
       | Give me something that isn't based on protobufs at wire / request
       | level. CBOR with CDDL for a fully standards based approach that
       | can work at any size of the stack
        
         | pranay01 wrote:
         | What's the issue with protobufs?
        
           | reindeerer wrote:
           | The first issue is that protobufs arent a standard. That
           | inherently limits anything built on top of them to not be a
           | standard either, and that limits their applicability
           | 
           | Also depending on the environment you run in, can code size
           | bloat vs alternatives can matter
        
             | tonyarkles wrote:
             | > Aren't a standard
             | 
             | You mean like an IETF standard? That is true, although the
             | specification is quite simple to implement. It is certainly
             | a de-facto standard, even if it hasn't been standardized by
             | the IETF or IEEE or ANSI or ECMA.
             | 
             | > inherently limits anything built on top of them to not be
             | a standard either
             | 
             | I'm not sure that strictly follows.
             | https://datatracker.ietf.org/doc/html/rfc9232 for example
             | directly references the protobuf spec at
             | https://protobuf.dev/ and includes protobufs as a valid
             | encoding.
             | 
             | > depending on the environment
             | 
             | I've had several projects that ran on wimpy Cortex M0
             | processors and printf() has generally taken more code space
             | in flash than NanoPB. This is generally with the same
             | device doing both encoding and decoding.
             | 
             | If you're only encoding, the amount of code required to
             | encode a given structure into a PB is very close to
             | trivial. If I recall it can also be done in a streaming
             | fashion so you don't even need a RAM buffer necessarily to
             | handle the encoded output.
             | 
             | Do I love protobufs? Not really. There's often some issue
             | with protoc when running it in a new environment. The APIs
             | sometimes bother me, especially the callback structure in
             | NanoPB. But it's been a workhorse for probably 15 years now
             | and as a straightforward TLV encoding it works pretty
             | darned well.
        
           | jahewson wrote:
           | I used protobufs for a short while and came to the
           | realization that they're just Go's opinionated idioms forced
           | on other languages via awkward SDKs. Particularly did not
           | like having to use codegen, rely on Google's annotations for
           | basic functionality or deal with field masks that are a sort
           | of poor man's GraphQL.
           | 
           | I get it, Google made trade offs that work for them, and I
           | agree with their position - but for someone at a smaller
           | company working in a non-Go/Java/C programming language it
           | was just a ton of friction for no benefit.
        
         | alright2565 wrote:
         | Here's the example payloads for OTLP over JSON and example of
         | how to ingest them: https://github.com/open-
         | telemetry/opentelemetry-proto/tree/m...
        
           | reindeerer wrote:
           | Yes, but the schema def is still in protobuf
        
       ___________________________________________________________________
       (page generated 2024-01-12 23:00 UTC)