[HN Gopher] Why and how GitHub is adopting OpenTelemetry
___________________________________________________________________
Why and how GitHub is adopting OpenTelemetry
Author : todsacerdoti
Score : 140 points
Date : 2021-05-26 19:21 UTC (3 hours ago)
(HTM) web link (github.blog)
(TXT) w3m dump (github.blog)
| infogulch wrote:
| For everyone blindly rage triggered by the presence of the bytes
| "t-e-l-e-m-e-t-r-y", cogman10 elsewhere ITT [1] summarized that
| concern well:
|
| > It's like being afraid your browser profiler is being used to
| spy on you. Could it do that? Sure. but there are so many easier
| ways to accomplish the same task.
|
| So take a second to straighten out your panties and then actually
| _look at_ the thing first. It 's just an open-source (!) APM
| protocol that competes with (edit: more like, adjacent to)
| DataDog, DynaTrace, New Relic etc etc. _It 's not even a suitable
| tool for spying_.
|
| [1]: https://news.ycombinator.com/item?id=27296419
| nonbirithm wrote:
| It's interesting how names can send signals without any
| substance of the actual product being relevant. (The thread
| over 'bro pages' was one example.) This specific instance is
| rather revealing of one of the biases of some HN users.
|
| I look at myself and I find that many opinions on technology
| that I hold have clearly been shaped by how much time I've
| spent on HN. Data-exfiltration telemetry is one example. But I
| now think about a lot of things I might not have given much
| thought to before, like how being backed by a VC can shape the
| direction of a service for the worse. I also find that I'm
| unnecessarily hardline on topics like paid services versus open
| source, and shut out anything positive people have to say about
| advertising as a revenue source.
|
| I sometimes imagine what would happen if I was born inside the
| borders of a different country. Some countries have very
| nationalistic citizens. How much of my thought processes are a
| result of the people and signals I surround myself with? How
| many of those signals will reach me whether or not I consent to
| seeing them (advertisements being one example)?
|
| I think I could have been a very different person if I was
| raised even a hundred miles from where I actually grew up,
| despite having a similar chemical makeup. That makes me be more
| conscious about where I'm likely to seek information, and to
| try not to look for opinions that I already agree with
| entirely.
| cogman10 wrote:
| Doesn't even compete with them :) Each of those have
| opentelemetry adapters.
|
| You'd write your opentelemitry traces throughout your code and
| at some top level point in your app you configure and say "Hey,
| OpenTelemetry, report to New Relic".
|
| By itself, OpenTelemetry does nothing.
|
| It's usefulness is that someone writing a lib can add
| OpenTelemetry calls throughout and anyone using that lib can
| then collect metrics/traces into whatever metrics solution they
| are currently usings (be it DynaTrace, DataDog, or New Relic).
| wdb wrote:
| It doesn't really compete Datadog, Dynatrace, Honeycomb,
| Lightstep, new Relic are all partners in defining the
| specification and help implementing it. Most of them already
| allowing ingesting traces over the OLTP protocol
| rektide wrote:
| I tend to think computing itself has been in kind of a dumb
| rut. Right now, OpenTelemetry is just logging & tracing,
| imagined as tools for ops, but over time, I fully expect we
| begin to see this as an event-stream in itself, something we
| can use for Event Sourcing, to watch & trigger new compute
| based off of.
|
| That's not in the cards today. But longer term, I think
| "knowing what computers are doing" is big business. And, I am
| very sad to say, eventually it will the obvious & logical way
| to spy on folks too. Because it will be the obvious & logical
| way to do many many many things, not because it's tech that's
| built or intended for spying. But right now computing is
| ephemeral, we don't persist any of the stack traces we compute
| through, and fundamentally, tracing really is about distilling
| out & keeping higher level stack traces. It's something
| computing needs to have been doing, that will bring us to a
| radically higher level of understanding (but it also does have
| some scary uses).
|
| It's been a couple years, but tracing is a powerful new basis
| with which to start re-engaging the "Turning the datqbase
| inside out"[1] / "I <3 Logs" view of data-storage (& computing)
| that had some buzz for a bit. We're not at all here yet, but I
| think it's coming.
|
| [1] https://martin.kleppmann.com/2015/11/05/database-inside-
| out-...
| cogman10 wrote:
| Who will be doing this spying?
|
| It certainly wouldn't be opentelemetry itself as that's just
| a interface you add adapters to.
|
| Are you thinking a man in the middle would spy? How would
| that work? This information is pushed over secure connections
| on the backend likely in a VPN. On the front end, it'd be
| transmitted over HTTPS. Shouldn't we be more fearful of
| information collected from DNS than encrypted data sent over
| HTTPS?
|
| Or are you thinking the metrics aggregators are going to do
| the spying? How would that impact their business model? Do
| you think a company would continue to pay the likes of new
| relic if they were caught giving access to metrics data to
| outside groups? Do you worry about postgres or prometheus
| sharing your data with 3rd parties? What about SQL server?
|
| What is the risk model and how would it be different from say
| the risk model of making an REST call or using a 3rd party
| library?
|
| Or is it just that "because this is well integrated
| throughout, it could be used to spy"? Because, generally
| speaking, these traces don't have enough information to
| identify what individual users are doing to the system. Even
| if they did, that wouldn't be a great way to track a user,
| you'd simply put that tracking information right at the front
| end of the system. Plumbing it from one end of the system to
| the other gives little value for a spy. It's adding a bunch
| of noise to the question you'd want to ask "what are the
| user's behaviors with our product?"
|
| And even if the demand is there, why would you do this
| through tracing an not a purpose built spy tool. Wouldn't it
| be easier for a nefarious lib writer to make a plugin purpose
| built to collect evil data? If that sells, why wouldn't a
| tech company buy that instead of buying a solution which
| combs 3 layers of separation to get worse answer? Why
| wouldn't google analytics still exist?
| rektide wrote:
| We're forming the best, most complete, competent view of
| computing-that's-happened that we've ever formed.
|
| You have a bunch of weird straw men that I don't get. I
| tried to de-emphasize the role of behavioral analytics &
| user-tracking, because I think it's just one small part of
| what this will be used for. But I am fairly confident we
| will eventually start to do more user-tracking via these
| systems. I've used half a dozen different user-tracking
| products at various companies, and they all read like
| ultra-low-fi versions of the ops tools. Ops tools have been
| evolving, at a far faster rate, far more in the public
| domain, and at some point, it just wont make sense to
| instrument your product twice.
|
| I want to re-iterate that I see this as one of the
| smallest, least interesting aspects of a coming Event-
| Sourcing-powered-by-tracing world. There's far more
| profound implications for what could happen to computing in
| general here (a de-wiring of the request/response
| microservice world & a shift towards async, reactive
| systems). But even today, it feels to me like folks work
| very hard to draw a distinction between their ops tools &
| their behavioral analytics tool. As a developer, they often
| work, I often interface with them in very very similar
| fashion. The desire to draw the distinction has felt
| illogical, and felt unsupported. Especially as the ops
| tools advance, I think it will be harder to reconcile the
| idea that there are & ought be separate systems of
| tracking/viewing/understanding.
| michaelperel wrote:
| For anyone interested in learning how to use Open Telemetry for
| distributed tracing in go, I recently made a demo app to share
| with some friends: https://github.com/michaelperel/otel-demo
| Animats wrote:
| Does this put "telemetry" in Git itself? If you're just using git
| to access Github, is it snooping on you?
| cosmojg wrote:
| Huh? The day GitHub has that much control over the Git project,
| the world of software as we know it will end.
|
| GitHub [?] Git. Far from it.
| vbsteven wrote:
| No, OpenTelemetry is focussed on gathering metrics in
| distributed systems. Metrics, log aggregation and tracing of
| requests from the entrypoint (load balancer) all the way to the
| backend services and data stores.
|
| Take a look at the Jaeger and Zipkin websites and it should be
| pretty clear what it is used for.
| cogman10 wrote:
| And, it should be stated, that it isn't even really about
| gathering metrics. It is about providing a standard interface
| to gather metrics. (I know you know this, but with the
| confusion here, I figure I should go into details).
|
| The point of OpenTelemitry is to make it so you could write
| the places you get your traces/metrics in one part of code
| and configure which backend system it is collected into in
| another part.
|
| So, for example, you'd add a `trace("my slow thing"){ be slow
| }` into your code and later add `report to zipkin` or `report
| to Jaeger` in another part. The place where you trace "my
| slow thing" doesn't care about how to interact with the
| backend system or which backend vendor is ultimately used.
| You can start using prometheus, zipkin, new relic, Jaeger,
| whatever, just so long as they have an OpenTelemetry adapter
| you are golden.
|
| The analog is SLF4J in Java.
| 4f77616973 wrote:
| Add to this that it's Microsoft owned.
| StreamBright wrote:
| I think it would be great to show the performance impact of these
| SDKs because it is one of the really important aspects of
| monitoring (being non-intrusive).
| Philip-J-Fry wrote:
| Well, the second you do any database call or other service call
| you've already spent 100x longer doing that than you have
| recording some timings.
|
| These clients will usually buffer the stats in memory and push
| them out asynchronously. Performance is definitely affected but
| I'm pretty sure it's negligible for most cases.
|
| Best practice would be to reduce tracing ratio in production
| too. So most requests are literally just a timing.
| malkia wrote:
| Look into envoy/istio - e.g. these introduce side-processes
| (sidecars) where your process talks to, and these create the
| traces for you at some perf cost. There are proxies for some of
| the existing services - like mysql -
| https://www.envoyproxy.io/docs/envoy/latest/configuration/li...
|
| I haven't used it myself, I've used census, and now looking
| into OpenTelemetry (though from the least finished version -
| C++). Had mixed success with it in the past, but trying again.
| Also not looking at all into side-cars, etc. - We compile all
| our internal tools, so adding this inside is where I'm getting
| into.
|
| I've had several times (while at Google), being asked by an SRE
| that I would call on issues, and they would request to bump the
| sample tracing from minimal defaults (was it 1 in 100,000 or
| million - forget) sometimes to 1:1 - for say 30 seconds. This
| way they'll receive on their end (in their systems that we use)
| flags to sample too, and at the end get full logs.
|
| Usually the whole trace is visible in few minutes. There were
| few UI's (nothign like zipkin/jaeged/others outside) - some of
| them with very "imgui"-like hacky (in good sense) view - like
| programmer art all over (which I loved - it was much more
| condensed than standard zipkin/jaeger).
|
| You could've marked something as important, and it'll retain
| for longer period - otherwise - poof - soon gone. Also it would
| collect info sometimes directly from the machine it was in
| (rather than wait to populate).
|
| Obviously, I don't know the details - I was just an user, or
| more like - allowing (when oncall) trace sampling to be bumped
| by the SRE - so they would get more info. It's what hooked me
| actually, because how else would one get everything end-to-end.
|
| Surprisingly it's also useful for single apps, where you have
| threads (or concurrency tasks, like with TBB/ConCRT) doing
| nested parallel_for's or spawning jobs, and you want to get
| idea what's going on. The only tricky bit is how to get your
| "context" propagated from one thread to another (also not
| readily done).
|
| It's one thing that the "golan" got right with their context
| for example.
|
| So it's really awesome, but probably really hard to get right
| the first few times.
| Thaxll wrote:
| envoy/istio do not replace telemetry in application because
| they only see what's pass through them and don't know
| anything else. You're missing a lot if you only instrument
| through a proxy.
| jeffbee wrote:
| I've noticed that orgs where I've worked vary between being
| totally insensitive to observability cost to being real
| hardasses about it. But I think most smaller shops are falling
| into the former category. I've even heard in meetings crazy
| shit like "It's very low overhead, only about 5%" which would
| get you laughed out of the office at, say, Google.
| Unfortunately (to me) the focus on ease-of-use has meant that
| OpenTelemetry concepts are structured in such a way to
| _preclude_ even the possibility of a very efficient
| implementation, which means that there will be a schism between
| people who are happy in the otel ecosystem and people who can
| 't use it on cost grounds, who probably will splinter into
| distinct home-grown solutions.
| drewbug01 wrote:
| > Unfortunately (to me) the focus on ease-of-use has meant
| that OpenTelemetry concepts are structured in such a way to
| preclude even the possibility of a very efficient
| implementation
|
| Curious what you mean about the design of OpenTelemetry
| precluding efficient implementation?
| pm90 wrote:
| You're correct that 5% increase in resource usage is probably
| not noticeable for most orgs.
|
| Its important to know what audience you're building for. I
| believe the audience for otel consists largely of companies
| that don't look anything like Google. So its fine to
| sacrifice that last bit of perf gain if it means the code is
| easier to use and maintain.
|
| FWIW, it also appears that companies like Google would fork
| or reimplement such systems anyway.
| legulere wrote:
| It's also about where your costs are. Google has much less
| revenue per compute-time/requests/whatever than most other
| companies. If you target the business to business field your
| computing costs are usually negligible while your developers
| cost a lot of money. Throwing more hardware at the problem is
| often the most economical solution.
| crandycodes wrote:
| As someone who's needed to maintain complex, high-performance
| database drivers that needed to work across a bunch of
| different platforms, I've been following them and their
| predecessors of OpenTracing/OpenCensus. The problem that's
| always been interesting to me as a library maintainer is
| consistency across platforms and well maintained multi-platform
| libraries.
|
| I hadn't really found an acceptable solution that would work
| across Java, Node.js, browser, and so on. We'd invented our own
| formats and then we owned all the integration problems with
| various monitoring tools. I left the team before we started to
| adopt, but they've started doing it and it looks like it's help
| with reducing integration burden. I also think using someone
| else's opinionated library can help avoid bikeshedding on
| concepts not related to your core value.
| rektide wrote:
| A lot of very confused weird comments here.
|
| This article is about a fairly large sized tech company adopting
| a fairly recent but increasingly mainstream & popular tool that
| helps them understand their operations. It'll give them a
| standard way to see what their computers are doing, across their
| various systems.
|
| OpenTelemetry is one of the key emerging cloud standards, and I
| expect many many many many more articles like this going
| forwards, from all kinds of companies.
|
| As for concerns about user tracking & privacy, that's not
| generally what these operational tools are used for. I haven't
| heard of a single case of them being used for user tracking or
| behavioral analytics. Thusfar these are purely operational tools,
| to understand the health of systems, to debug & typically to
| understand what happens to an incoming request as it works
| through dozens of systems & services to get processed. That said,
| I tend to think over time the importance of this distributed
| after-the-fact log we are building is going to become inverted.
| That we will start to see the potential to harvest these records
| for analytics, and more generally, to forward-feed them into
| other processes to automatically build & advance Event Sourcing
| systems. Right now these systems are relatively pure & good, but
| what's really at stake here is that we've been doing computing
| blind, with no record of what's happened, and OpenTelemetry is a
| key first step in lifting that veil of ignorance as to what
| computing has happened. We are beginning to capture the data of
| what compute occurred. Many things will emerge as we open this
| box.
| infogulch wrote:
| Even other responses to your comment are confused. The primary
| use case is to trace requests that go through your backend of
| distributed microservices. The fact that it can also be used to
| collect PII-adjacent data as in sibling's example is about as
| relevant as the fact that it uses the HTTP protocol. I didn't
| see any similar comments raging against the HTTP protocol on
| the last curl post that came through here.
|
| People here are blindly rage-triggered by the word "telemetry"
| without even taking 2 seconds to glance through to see what it
| actually is.
| yannoninator wrote:
| Wait, you said:
|
| > I haven't heard of a single case of them being used for user
| tracking or behavioral analytics.
|
| and then you said:
|
| > That we will start to see the potential to harvest these
| records for analytics...
|
| So this can be used to gather analytics of any sort of data,
| such as spying then? This is still worrisome.
| AlexandrB wrote:
| > As for concerns about user tracking & privacy, that's not
| generally what these operational tools are used for.
|
| I'm not sure that the intended use matters. Unless the
| telemetry systems are carefully designed to not capture PII at
| all they become yet another channel that must be secured _as
| if_ they are collecting PII. For example, see Windows 10
| telemetry vs. HIPPA:
| https://hipaaone.com/2015/09/22/windows-10-and-hipaa/
| random3 wrote:
| They are agnostic data collection protocols/ systems /
| libraries. Just like a protocol they could do PII collection
| and unlikely to prevent that.
|
| Ultimately the intended use matters more than the tech. And
| in this case it seems purposely built to measure performance.
| cogman10 wrote:
| Agreed. This comment section is bizarre for HN. It's sort of
| like getting scared that "Prometheus" collects "metrics". Or
| that "Kafka" tracks "events".
|
| These aren't scary things.
| handrous wrote:
| "Telemetry" has come to be associated with activities that
| definitely qualify a product as spyware (though they've
| become fairly common--thanks, "data-driven" marketers).
|
| From reading the OpenTelemetry site, that doesn't seem to be
| their main, stated purpose, but thefounder's post in another
| part of the thread leads me to think it may, in fact, see
| that kind of use, too.
|
| [EDIT] damn, sorry, downvoters, for explaining the reason
| this is getting knee-jerk negative reactions from people,
| when someone expressed confusion about it. Again, I think the
| main reason is the word used, and what it's mostly associated
| with now, among some folks.
| infogulch wrote:
| Every tool is also a weapon, obviously. We haven't stopped
| making hammers because they can be used to hit people.
| handrous wrote:
| It's why people are making these assumptions about the
| term "telemetry". In certain circles it _usually_ means
| spyware. One of those circles includes front-end web,
| which is pretty well-represented on HN.
|
| [EDIT] and, especially, that's why they're jumping to the
| conclusion that _github_ is gearing up to do more spying
| on its users, which is the part that I think people are
| bothered by, not the existence of this software package.
| cogman10 wrote:
| OpenTelemitry isn't useful for spying purposes. It just
| isn't. If github wanted to spy on users they have way
| more direct methods that don't involve collecting the
| time your browser executes a function or waits on an AJAX
| request.
| handrous wrote:
| Cool, that's fine, I was just explaining _why_ it 's
| getting that reaction from people. Telemetry means
| different things to different people and a (quite new)
| use is basically just a euphemism for "watching over
| people's shoulder while they use our software"--but,
| among some people, that's the _main_ use they encounter,
| so they assumed that 's what this is.
| cogman10 wrote:
| The sorts of stuff collected by opentelemitry are benign.
| Can it be weaponized? Of course, but that's like using a
| screwdriver to kill a fly. If I wanted to collect evil
| "telemitry" I wouldn't touch opentelemitry because it
| isn't useful for that.
|
| It's like being afraid your browser profiler is being
| used to spy on you. Could it do that? Sure. but there are
| so many easier ways to accomplish the same task.
| yannoninator wrote:
| I'm now gravely concerned about GitHub after reading this.
|
| So now I should be excited about _telemetry_ and that GitHub uses
| this in their systems?
|
| This raises alarm bells into if I should be using them for my
| repositories at all.
|
| Unsure if I should trust GitHub after this now.
|
| Downvoters: I don't understand here? So I _should_ be excited
| about telemetry? Why?
| yunohn wrote:
| I believe you (and many others here) have misunderstood the
| article. They are talking about telemetry on their backend
| systems to track performance metrics and the like. Not the
| usual telemetry that HN loves to argue about.
| [deleted]
| mikece wrote:
| Are there any examples of companies making a contractually
| binding pledge to never use any telemetry data for anything than
| improving the application/service... NEVER for creating marketing
| profiles or for surfacing ads?
| adamcstephens wrote:
| This is likely more in the realm of observability than what
| you're thinking. eg metrics, logs and traces. Could you store
| marketing data in here? Sure, but that's not really what
| opentelemetry is built for.
| chippy wrote:
| It seems that, in the majority of cases, we are left with trust
| at the end of the day. They might say "we will never do X" but
| if someone buys them, then the "we" becomes something
| different, and the new company is free to do whatever. With
| github, it's that paragon of good behaviour Microsoft that
| controls things.
|
| These promises need to be written down in the articles of
| association in a company, and I have never seen this.
| handrous wrote:
| They're not using "telemetry" as a euphemism for "shipping
| spyware to end users to record their actions on their own
| machine", in this case. This appears to be a very fancy log
| aggregation product that has also eaten logging/tracing related
| configuration deployment functionality.
| wdb wrote:
| There is the web-packages in Javascript that creates traces
| for click events etc. but also the track how much time
| requests took on the client-side. Or render times of the UI
| e.g. Vue/React components
|
| See: https://github.com/open-telemetry/opentelemetry-js-
| contrib/t...
| cogman10 wrote:
| It's not even that. It's the interface that other
| logging/tracing apis can plug into. It's the "SLF4J" of
| tracing/metrics gathering.
|
| Why is this useful? Because it lets people writing libraries
| provide the ability for users of their libraries to track
| performance information without dictating to them what
| performance tracking tools they should use.
| postpawl wrote:
| In the context of this article - opentelemetry is for
| performance tracing on the backend. The call to the endpoint
| would already end up in access logs, and this would just give
| more detailed performance metrics on queries and function
| calls.
| enriquto wrote:
| even if they did, that would't mean anything. They could easily
| construct an argument that surfacing ads serves to improve the
| application (with a few extra rethoric steps in between).
| pdkl95 wrote:
| If a company is serious about this type of promise, they
| should take away their ability to change their mind in the
| future with a _Ulysses pact_ [1]. See this[2] older post
| where for a bit more detail from a talk by Cory Doctorow.
|
| >> The answer to not getting pressure from your bosses, your
| stakeholders, your investors or your members, to do the wrong
| thing later, when times are hard, is to take options off the
| table right now.
|
| [1] https://en.wikipedia.org/wiki/Ulysses_pact
|
| [2] https://news.ycombinator.com/item?id=20411018
| enriquto wrote:
| How would that be implemented in practice, for the case of
| telemetry? The data is held by a third party? Couldn't they
| always "buy" that third party? Is there a standard Ulysses
| mechanism for data compartmentalization?
| sebow wrote:
| Considering GH is MS-owned, i'm even more reluctant to use it.
|
| Anything that leaves your computer and enters hardware owned by
| MS should make you think twice about using a certain
| service/product.
|
| Same goes for almost every other company out there.Also FOMO is
| not an argument, there are plenty of alternatives for almost
| everything out there.If it's comfort people are worried about
| then they should stick to iPads and ditch software-dev, because
| trends like these are why the quality of software has been
| declining drastically in the last decade.
|
| These streamlined processes of using telemetry for the "product's
| good" are just a way to compensate that.
| jetpks wrote:
| I think you have a misunderstanding of what OpenTelemetry is.
| It helps you understand what your backend software is doing
| with insights like how long database queries take to complete.
| wdb wrote:
| It's a really nice solution. I am using it to send traces and
| metrics via a standardised API in Go, and Node.js. During
| development I am sending without sampling it to Jaeger but in
| production I am sending ratio sampled traces to Google Trace but
| you can use Honeycomb or Lightstep. For metrics, I am sending
| them to local Prometheus during development, and cluster
| prometheus in production.
|
| It's nice to see all the traces through all the different
| services of one API request and all its log items it generated.
| Using a custom span exporter to ensure PPI is stripped away as
| much as possible for span attributes.
|
| Using W3C trace context and baggage to send some more data along,
| e.g. sending the trace id, span id etc into pub sub events to
| ensure to connect them with the trace.
|
| Loving it! For now, I think it's worth the costs of adding
| tracing to services.
| thefounder wrote:
| Off topic: At some point my wasm app was consuming excessive
| memory bringing the browser to a halt. The issue was the
| OpenTelemetry package(Go-lang) used by various Google sdks.
| Forking/sanitizing the sdks fixed the issue.
| ehershey wrote:
| What do you mean by "sanitizing" the sdk's? Did you remove the
| OpenTelemetry package?
| thefounder wrote:
| >> Did you remove the OpenTelemetry package?
|
| Exactly!
___________________________________________________________________
(page generated 2021-05-26 23:00 UTC)