[HN Gopher] OpenTelemetry for Go: Measuring overhead costs
       ___________________________________________________________________
        
       OpenTelemetry for Go: Measuring overhead costs
        
       Author : openWrangler
       Score  : 80 points
       Date   : 2025-06-16 15:09 UTC (7 hours ago)
        
 (HTM) web link (coroot.com)
 (TXT) w3m dump (coroot.com)
        
       | dmoy wrote:
       | Not on original topic, but:
       | 
       | I definitely prefer having graphs put the unit at least on the
       | axis, if not in the individual axis labels directly.
       | 
       | I.e. instead of having a graph titled "latency, seconds" at the
       | top and then way over on the left have an unlabeled axis with
       | "5m, 10m, 15m, 20m" ticks...
       | 
       | I'd rather have title "latency" and either "seconds" on the left,
       | or, given the confusion between "5m = 5 minutes" or "5m = 5
       | milli[seconds]", just have it explicitly labeled on each tick:
       | 5ms, 10ms, ...
       | 
       | Way, way less likely to confuse someone when the units are right
       | on the number, instead of floating way over in a different
       | section of the graph
        
       | Thaxll wrote:
       | Logging, metrics and traces are not free, especially if you turn
       | them on at every requests.
       | 
       | Tracing every http 200 at 10k req/sec is not something you should
       | be doing, at that rate you should sample 200 ( 1% or so ) and
       | trace all the errors.
        
         | anonzzzies wrote:
         | A very small % of startups gets anywhere near that traffic so
         | why give them angst? Most people can just do this without any
         | issues and learn from it and a tiny fraction shouldn't.
        
           | williamdclt wrote:
           | 10k/s across multiple services is reached quickly even at
           | startup scale.
           | 
           | In my previous company (startup), we'd use Otel everywhere
           | and we definitely needed sampling for cost reasons (1/30
           | iirc). And that was using a much cheaper provider than
           | Datadog
        
           | cogman10 wrote:
           | Having high req/s isn't as big a negative as it once was.
           | Especially if you are using http2 or http3.
           | 
           | Designing APIs which cause a high number of requests and spit
           | out a low amount of data can be quite legitimate. It allows
           | for better scaling and capacity planning vs having single
           | calls that take a large amount of time and return large
           | amounts of data.
           | 
           | In the old http1 days, it was a bad thing because a single
           | connection could only service 1 request at a time. Getting
           | any sort of concurrency or high request rates require many
           | connections (which had a large amount of overhead due to the
           | way tcp functions).
           | 
           | We've moved past that.
        
         | orochimaaru wrote:
         | Metrics are usually minimal overheard. Traces need to be
         | sampled. Logs need to be sampled at error/critical levels. You
         | also need to be able to dynamically change sampling and log
         | levels.
         | 
         | 100% traces are a mess. I didn't see where he setup sampling.
        
           | phillipcarter wrote:
           | The post didn't cover sampling, which indeed, significantly
           | reduces overhead in OTel because the spans that aren't
           | sampled aren't ever created, when you head sample at the SDK
           | level. This is more of a concern when doing tail-based
           | sampling only, wherein you will want to trace each request
           | and offload to a sidecar so that export concerns are handled
           | outside your app. And then it routes to a sampler elsewhere
           | in your infrastructure.
           | 
           | FWIW at my former employer we had some fairly loose
           | guidelines for folks around sampling:
           | https://docs.honeycomb.io/manage-data-
           | volume/sample/guidelin...
           | 
           | There's outliers, but the general idea is that there's also a
           | high cost to implementing sampling (especially for nontrivial
           | stuff), and if your volume isn't terribly high then you'll
           | probably eat a lot more in time than paying for the extra
           | data you may not necessarily need.
        
         | jhoechtl wrote:
         | I am relatively new to the topic. In the sample code of the OP
         | there is no logging right? It's metrics and traces but no
         | logging.
         | 
         | How is logging in OTel?
        
           | shanemhansen wrote:
           | To me traces (or maybe more specifically spans) are
           | essentially a structured log with a unique ID and a reference
           | to a parent ID.
           | 
           | Very open to have someone explain why I'm wrong or why they
           | should be handled separately.
        
             | kiitos wrote:
             | Traces have a very specific data model, and corresponding
             | limitations, which don't really accommodate log
             | events/messages of arbitrary size. The access model for
             | traces is also fundamentally different vs. that of logs.
        
               | phillipcarter wrote:
               | There are practical limitations mostly with backend
               | analysis tools. OTel does not define a limit on how large
               | a span is. It's quite common in LLM Observability to
               | capture full prompts and LLM responses as attributes on
               | spans, for example.
        
           | phillipcarter wrote:
           | Logging in OTel is logging with your logging framework of
           | choice. The SDK just requires you initialize the wrapper and
           | it'll then wrap your existing logging calls and correlate
           | term with a trace/span in active context, if it exists. There
           | is no separate logging API to learn. Logs are exported in a
           | separate pipeline from traces and metrics.
           | 
           | Implementation for many languages are starting to mature,
           | too.
        
         | kubectl_h wrote:
         | You have to do the tracing anyway if you are going to sample
         | based on criteria that isn't available at the beginning of the
         | trace (like an error that occurs later in the request) and tail
         | sample. You can head sample of course, but that's going to be
         | the most coarse sampling you can do and you can't sample based
         | on anything but the initial conditions of the trace.
         | 
         | What we have started doing is still tracing every unit of work,
         | but deciding at the root span the level of instrumentation
         | fidelity we want for the trace based on the initial conditions.
         | Spans are still generated in the lifecycle of the trace, but we
         | discard them at the processor level (before they are batched
         | and sent to the collector) unless they have errors on them or
         | the trace has been marked as "full fidelity".
        
         | kiitos wrote:
         | > Tracing every http 200 at 10k req/sec is not something you
         | should be doing
         | 
         | You don't know if a request is HTTP 200 or HTTP 500 until it
         | ends, so you have to at least _collect_ trace data for every
         | request as it executes. You can decide whether or not to _emit_
         | trace data for a request based on its ultimate response code,
         | but _emission_ is gonna be out-of-band of the request
         | lifecycle, and (in any reasonable implementation) amortized
         | such that you really shouldn 't need to care about sampling
         | based on outcome. That is, the cost of collection is >> the
         | cost of emission.
         | 
         | If your tracing system can't handle 100% of your traffic,
         | that's a problem in that system; it's definitely not any kind
         | of universal truth... !
        
       | jeffbee wrote:
       | I feel like this is a lesson that unfortunately did not escape
       | Google, even though a lot of these open systems came from Google
       | or ex-Googlers. The overhead of tracing, logs, and metrics needs
       | to be ultra-low. But the (mis)feature whereby a trace span can be
       | sampled _post hoc_ means that you cannot have a nil tracer that
       | does nothing on unsampled traces, because it could become sampled
       | later. And the idea that if a metric exists it must be centrally
       | collected is totally preposterous, makes everything far too
       | expensive when all a developer wants is a metric that costs
       | nothing in the steady state but can be collected when needed.
        
         | mamidon wrote:
         | How would you handle the case where you want to trace 100% of
         | errors? Presumably you don't know a trace is an error until
         | after you've executed the thing and paid the price.
        
           | phillipcarter wrote:
           | This is correct. It's a seemingly simple desire -- "always
           | capture whenever there's a request with an error!" -- but the
           | overhead needed to set that up gets complex. And then you
           | start heading down the path of "well THESE business
           | conditions are more important than THOSE business
           | conditions!" and before you know it, you've got a nice little
           | tower of sampling cards assembled. It's still worth it, just
           | a hefty tax at times, and often the right solution is to just
           | pay for more compute and data so that your engineers are
           | spending less time on these meta-level concerns.
        
           | jeffbee wrote:
           | I wouldn't. "Trace contains an error" is a hideously bad
           | criterion for sampling. If you have some storage subsystem
           | where you always hedge/race reads to two replicas then cancel
           | the request of the losing replica, then all of your traces
           | will contain an error. It is a genuinely terrible feature.
           | 
           | Local logging of error conditions is the way to go. And I
           | mean local, not to a central, indexed log search engine;
           | that's also way too expensive.
        
             | phillipcarter wrote:
             | I disagree that it's a bad criterion. The case you describe
             | is what sounds difficult, treating one error as part of
             | normal operations and another as not. That should be
             | considered its own kind of error or other form of response,
             | and sampling decisions could take that into consideration
             | (or not).
        
               | jeffbee wrote:
               | Another reason against inflating sampling rates on errors
               | is: for system stability you never want to do more stuff
               | during errors than you would normally do. Doing something
               | more expensive during an error can cause your whole
               | system, or elements of it, to latch into an unplanned
               | operating point where they only have the capacity to do
               | the expensive error path, and all of the traffic is
               | throwing errors because of the resource starvation.
        
       | vanschelven wrote:
       | The article never really explains what eBPF is -- AFAIU, it's a
       | kernel feature that lets you trace syscalls and network events
       | without touching your app code. Low overhead, good for metrics,
       | but not exactly transparent.
       | 
       | It's the umpteenth OTEL-critical article on the front page of HN
       | this month alone... I have to say I share the sentiment but
       | probably for different reasons. My take is quite the opposite:
       | most value is precisely in the application (code) level so you
       | definetly should instrument... and then focus on Errors over
       | "general observability"[0]
       | 
       | [0] https://www.bugsink.com/blog/track-errors-first/
        
         | nikolay_sivko wrote:
         | I'm the author. I wouldn't say the post is critical of OTEL. I
         | just wanted to measure the overhead, that's all. Benchmarks
         | shouldn't be seen as critique. Quite the opposite, we can only
         | improve things if we've measured them first.
        
         | politician wrote:
         | I don't want to take away from your point, and yet... if anyone
         | lacks background knowledge these days the relevant context is
         | just an LLM prompt away.
        
           | vanschelven wrote:
           | It was always "a search away" but on the _web_ one might as
           | well use... A hyperlink
        
       | sa46 wrote:
       | Funny timing--I tried optimizing the Otel Go SDK a few weeks ago
       | (https://github.com/open-telemetry/opentelemetry-
       | go/issues/67...).
       | 
       | I suspect you could make the tracing SDK 2x faster with some
       | cleverness. The main tricks are:
       | 
       | - Use a faster time.Now(). Go does a fair bit of work to convert
       | to the Go epoch.
       | 
       | - Use atomics instead of a mutex. I sent a PR, but the reviewer
       | caught correctness issues. Atomics are subtle and tricky.
       | 
       | - Directly marshal protos instead of reflection with a hand-
       | rolled library or with
       | https://github.com/VictoriaMetrics/easyproto.
       | 
       | The gold standard is how TiDB implemented tracing
       | (https://www.pingcap.com/blog/how-we-trace-a-kv-database-
       | with...). Since Go purposefully (and reasonably) doesn't
       | currently provide a comparable abstraction for thread-local
       | storage, we can't implement similar tricks like special-casing
       | when a trace is modified on a single thread.
        
         | malkia wrote:
         | There is an effort to use arrow format for metrics too -
         | https://github.com/open-telemetry/otel-arrow - but no client
         | that exports directly to it yet.
        
         | rastignack wrote:
         | Would the sync.Pool trick mentionned here:
         | https://hypermode.com/blog/introducing-ristretto-high-perf-g...
         | help ? It's lossy but might be a good compromise.
        
       | otterley wrote:
       | Out of curiosity, does Go's built-in pprof yield different
       | results?
       | 
       | The nice thing about Go is that you don't need an eBPF module to
       | get decent profiling.
       | 
       | Also, CPU and memory instrumentation is built into the Linux
       | kernel already.
        
       | coxley wrote:
       | The OTel SDK has always been much worse to use than Prometheus
       | for metrics -- including higher overhead. I prefer to only use it
       | for tracing for that reason.
        
       | reactordev wrote:
       | Mmmmmmm, the last 8 months of my life wrapped into a blog post
       | but with an ad on the end. Excellent. Basically the same findings
       | as me, my team, and everyone else in the space.
       | 
       | Not being sarcastic at all, it's tricky. I like that the article
       | called out eBPF and why you would want to disable it for speed
       | but recommends caution. I kept hearing from executives a "single
       | pane of glass" marketing speak and I kept my mouth shut about how
       | that isn't feasible across the entire organization. Needless to
       | say, they didn't like that non-answer and so I was canned. What
       | an engineer cared about is different from organization/business
       | metrics and often the two were confused.
       | 
       | I wrote a lot of great otel receivers though. VMware, Veracode,
       | Hashicorp Vault, GitLab, Jenkins, Jira, and the platforms itself.
        
         | phillipcarter wrote:
         | > I kept hearing from executives a "single pane of glass"
         | marketing speak
         | 
         | It's really unfortunate that Observability vendors lean into
         | this to reinforce it too. What the execs usually care about is
         | engineering workflows consolidating and allowing teams to all
         | "speak the same language" in terms of data, analysis workflows,
         | visualizations, runbooks, etc.
         | 
         | This goal is admirable, but nearly impossible to achieve
         | because it's the exact same problem as solving "we are aligned
         | organizationally", which no organization ever is.
         | 
         | That doesn't mean progress can't be made, but it's always far
         | more complicated than they would like.
        
           | reactordev wrote:
           | For sure, it's the ultimate nirvana. Let me know when an
           | organization gets there. :)
        
       ___________________________________________________________________
       (page generated 2025-06-16 23:00 UTC)