[HN Gopher] The current state of OpenTelemetry
___________________________________________________________________
The current state of OpenTelemetry
Author : pranay01
Score : 79 points
Date : 2024-01-12 17:36 UTC (5 hours ago)
(HTM) web link (signoz.io)
(TXT) w3m dump (signoz.io)
| baby_souffle wrote:
| Depends on who you ask.
|
| I am glad that the observability sector has standardized on a
| common protocol but my god are the reference implementations
| lacking.
| politelemon wrote:
| > my god are the reference implementations lacking.
|
| Can you share some of your experience, what do you mean by
| that? Are there edge cases causing problems, or major missing
| features? Easy or difficult to use?
| abeppu wrote:
| As an example, Exemplars are part of the metrics spec [1].
| The official python library says metrics status is 'stable'
| [2]. But there's an approximately 2-year old issue with no
| work on it, titled 'Metrics: Add support for exemplars',
| where the latest update is that no work has begun [3].
| Nothing at a top-level of the opentelemetry-python project
| indicates that the project does not implement everything in
| the metrics spec, so if you wanted to use that capability,
| you are apt to discover it relatively late.
|
| [1] https://opentelemetry.io/docs/specs/otel/metrics/data-
| model/...
|
| [2] https://github.com/open-telemetry/opentelemetry-python
|
| [3] https://github.com/open-telemetry/opentelemetry-
| python/issue...
| arwineap wrote:
| otel logging is completely missing from golang for example
| bbkane wrote:
| I agree this should be there, but I also think in most
| cases, logs can be completely replaced by otel tracing -
| see https://www.infoq.com/presentations/event-tracing-
| monitoring...
| pranay01 wrote:
| Agree, there being an open standard for instrumentation is a
| big win. Lots of work still needs to be done on showing more
| examples and making it more accessible to users & implementors.
|
| One other key area is resources which can help get
| engineers/implementors to get organizational buy-in
| neonsunset wrote:
| C# has pretty nice integration with OTEL out of box (ASP.NET
| Core and otherwise, distributed as separate packages)
|
| https://learn.microsoft.com/en-us/dotnet/core/diagnostics/ob...
| phillipcarter wrote:
| As a maintainer and end-user, my answer to this is...yes and no.
| It's important to clarify that, stability - something mentioned
| in the article - has several major definitions:
|
| - Stability in the specification
|
| - Stability in semantic conventions
|
| - Stability in the protocol representation
|
| - Stability in SDKs that can generate data
|
| - Stability in the Collector that can receive, process, and
| export that data
|
| Unfortunately, for many people, they may interpret "stable" in
| one of those categories as "stable for everything", and then get
| really annoyed when they find their language doesn't actually
| have stable support (or any support!) for that concept.
|
| What I'm most proud of in 2023 is all of the little things we
| made progress on with components that engineers have to
| materially deal with. On the website, we documented what feels
| like a million little things and clarified tons of concepts that
| people told us were confusing. Across all the SDKs, we fixed tons
| of little bugs, added more and more instrumentations, and
| completed the unsexy work to make metrics generation stable
| across most of our 11+ languages. The Collector added oodles and
| oodles of support for different data sources, and OTTL went from
| a neat component to a rock-solid general-purpose data
| transformation tool.
|
| There's so much more work to do, but I'm really happy about the
| progress.
| shipit1999 wrote:
| OpenTelemetry is a great concept, but in my experience not quite
| there yet. Docs especially fall into the common trap of handling
| the happy path hello world quickstarts, then become increasingly
| useless as you want to get beyond that to real life use cases.
| Given the inherent tradeoff of complexity that comes from trying
| to unify different approaches around one standard, sometimes it
| seems like things that should be simple are more difficult than
| they should be. I'm sure it will keep improving.
| mason55 wrote:
| > _Docs especially fall into the common trap of handling the
| happy path hello world quickstarts, then become increasingly
| useless as you want to get beyond that to real life use cases._
|
| Yeah, Java is what I'm most familiar with. The "Getting
| Started" shows how to do some basic manual instrumentation and
| collect the output with curl. Then the "Next Steps" are just
| random things with no guidance about why I would or wouldn't
| choose any of them for my next step.
|
| But, ok, I choose "Automatic Instrumentation", that sounds
| promising. And it actually is really easy to set up auto
| instrumentation. But then at the end it says
|
| > _After you have automatic instrumentation configured for your
| app or service, you might want to annotate selected methods or
| add manual instrumentation to collect custom telemetry data._
|
| Uh... no... after I have automatic instrumentation enabled I
| _want to do something with the output_
|
| The two major flaws in the docs seem to be
|
| 1. The common failure of docs to explain to users why they
| might choose one thing or another. "If you want to do x.. If
| you want to do y.." what if I don't know?
|
| 2. Because otel is agnostic to the consumer of the output,
| there's very little in the way of explaining how to get value
| out of what otel produces. To connect the dots, you really need
| to use the docs of your observability tool. Which I understand,
| but then most of them have their own setup directions because
| they want some extra fields included in the data, or they have
| their own fork, so not everything in the otel docs is actually
| usable.
|
| I'm not sure what the answer is. It's not like I expect otel to
| document how to build a dashboard in Grafana. And a lot of
| frustration I've experienced has been with the observability
| tools themselves. But at the same time, I always feel like the
| otel docs just don't get you anywhere close to getting value
| out of the library. Which is a shame, because turning on auto-
| instrumentation and seeing all your traces with literally no
| extra work is a magical moment.
| Volundr wrote:
| Hmmm... Yeah I setup open telemetry for a couple personal
| projects this year was pleased with the ease of setup, but by
| and large I knew what I was doing specifically I had my
| application, and I had Grafana and I wanted to get traces
| from A to B.
|
| Relooking at the docs from the eyes of a newcomers if you
| don't already have a destination in mind they don't really
| help you. It's a little tricky because my setup with Grafana
| will be somewhat different (but similar) from someone using
| honeycomb or signoz or what have you, but even just having a
| "want to visualize your data? Check out the list of
| compatible vendors", with a link that direction would
| probably go a long way.
| structural wrote:
| By comparison, I wanted to use opentelemetry for a series
| of projects, but could find absolutely no useful
| documentation on how to do anything else other than "send
| data from a webapp to a server / other cloud service that
| some vendor wants to sell you".
|
| All I wanted to do was instrument an application and write
| its telemetry data to a file in a standard way, and have
| some story regarding combining metrics, traces, and logs as
| necessary. Ideally this would use minimal system resources
| when idle. That's it.
| bbkane wrote:
| It doesn't read from files unfortunately, but
| https://openobserve.ai/ is very easy to set up locally
| (single binary) and send otel logs/metrics/traces to.
|
| Here's how I run it locally for my little shovel project
| - https://github.com/bbkane/shovel#run-the-webapp-
| locally-with... .
|
| Also linked from that README is an Ansible playbook to
| start OpenObserve as a systems service on a Linux VM.
|
| Alternatively, see the shovel codebase I linked above for
| a "stdout" TracerProvider. You could do something like
| that to save to a file, and then use a tool to prettify
| the JSON. I have a small script to format json logs at ht
| tps://github.com/bbkane/dotfiles/blob/2df9af5a9bbb40f2e10
| 1...
| starkparker wrote:
| > 1. The common failure of docs to explain to users why they
| might choose one thing or another. "If you want to do x.. If
| you want to do y.." what if I don't know?
|
| Observability docs in general struggle with this. So many
| data sources can emit so many types of metrics in so many
| formats, and every tool makes this impossible promise of
| consolidating it all into one space seamlessly. But tools
| like Grafana pride themselves so much on visualizing
| _anything_ that they paint themselves into a corner where
| they can't be prescriptive about common uses or methods
| without excluding or confusing others.
|
| So a lot of the prescriptive answers to "what if I don't
| know?" gets chucked onto account and support teams of
| commercial vendors, because the docs can't anticipate every
| possible context in which an observability tool will get
| deployed. Each solution ends up being custom tailored and
| poorly portable to anyone else's, often not even to other
| customers with the same data sources and goals at the same
| scale due to wacky labelling differences or legacy
| requirements or some internal stakeholder demand.
|
| More narrowly focused tools don't have as many of these
| problems, but not many organizations want narrowly focused
| observability tools. (Lots of _people_ do, but orgs don't
| want to pay out deals to multiple vendors for what looks like
| different flavors of the same result. And hey look it's
| Grafana Cloud or Datadog or whatever, it can do _anything_,
| so you devs and also bizops and SRE and IT and hey sales
| wants a dashboard too and so does the company cafeteria, why
| not, you all can just use this one tool and we just deal with
| one bill with a volume discount, right? Right??)
|
| Smarter tools don't have as many of these problems by
| papering over the docs limitations by being better able to
| anticipate or surface connections between data sources,
| metrics, logs, traces, events, etc., and does so with better
| interfaces. But especially for high-cardinality data the
| usability of those tools either seems to fall apart or their
| companies charge Datadog-sized invoices.
| reindeerer wrote:
| Give me something that isn't based on protobufs at wire / request
| level. CBOR with CDDL for a fully standards based approach that
| can work at any size of the stack
| pranay01 wrote:
| What's the issue with protobufs?
| reindeerer wrote:
| The first issue is that protobufs arent a standard. That
| inherently limits anything built on top of them to not be a
| standard either, and that limits their applicability
|
| Also depending on the environment you run in, can code size
| bloat vs alternatives can matter
| tonyarkles wrote:
| > Aren't a standard
|
| You mean like an IETF standard? That is true, although the
| specification is quite simple to implement. It is certainly
| a de-facto standard, even if it hasn't been standardized by
| the IETF or IEEE or ANSI or ECMA.
|
| > inherently limits anything built on top of them to not be
| a standard either
|
| I'm not sure that strictly follows.
| https://datatracker.ietf.org/doc/html/rfc9232 for example
| directly references the protobuf spec at
| https://protobuf.dev/ and includes protobufs as a valid
| encoding.
|
| > depending on the environment
|
| I've had several projects that ran on wimpy Cortex M0
| processors and printf() has generally taken more code space
| in flash than NanoPB. This is generally with the same
| device doing both encoding and decoding.
|
| If you're only encoding, the amount of code required to
| encode a given structure into a PB is very close to
| trivial. If I recall it can also be done in a streaming
| fashion so you don't even need a RAM buffer necessarily to
| handle the encoded output.
|
| Do I love protobufs? Not really. There's often some issue
| with protoc when running it in a new environment. The APIs
| sometimes bother me, especially the callback structure in
| NanoPB. But it's been a workhorse for probably 15 years now
| and as a straightforward TLV encoding it works pretty
| darned well.
| jahewson wrote:
| I used protobufs for a short while and came to the
| realization that they're just Go's opinionated idioms forced
| on other languages via awkward SDKs. Particularly did not
| like having to use codegen, rely on Google's annotations for
| basic functionality or deal with field masks that are a sort
| of poor man's GraphQL.
|
| I get it, Google made trade offs that work for them, and I
| agree with their position - but for someone at a smaller
| company working in a non-Go/Java/C programming language it
| was just a ton of friction for no benefit.
| alright2565 wrote:
| Here's the example payloads for OTLP over JSON and example of
| how to ingest them: https://github.com/open-
| telemetry/opentelemetry-proto/tree/m...
| reindeerer wrote:
| Yes, but the schema def is still in protobuf
___________________________________________________________________
(page generated 2024-01-12 23:00 UTC)