[HN Gopher] Vendor lock-in in the observability space
___________________________________________________________________
Vendor lock-in in the observability space
Author : talboren
Score : 74 points
Date : 2023-10-31 15:14 UTC (7 hours ago)
(HTM) web link (www.keephq.dev)
(TXT) w3m dump (www.keephq.dev)
| wbeckler wrote:
| Does Keep have an open core business model, where they host your
| stuff using a proprietary control plane for a fee and introduce
| proprietary features around the edges?
| shahargl wrote:
| Nope, fully open source - https://github.com/keephq/keep
| Volundr wrote:
| The article seems to suggest https://github.com/open-
| telemetry/opentelemetry-collector-co... was silently killed, yet
| it appears to have been merged in January, am I missing
| something?
| shahargl wrote:
| You should read this to get the full context -
| https://news.ycombinator.com/item?id=34540419
| camel_gopher wrote:
| Moving alerts is easy. Moving people, now that's hard.
| shahargl wrote:
| Wdym by moving people?
| talboren wrote:
| Like other things around, there's always the people's problem
| but when vendor locked, that's yet another big issue to address
| hbcondo714 wrote:
| I've bounced around Splunk, New Relic, Sentry and Datadog over
| the years. Most recently, I was working with Java and used the
| open source Vendor-neutral application observability facade
| Micrometer[1] to test out and confirm which APM we wanted to go
| with.
|
| [1] https://micrometer.io
| shahargl wrote:
| Cool! Never heard of it and definitely gonna check it.
| shahargl wrote:
| But that's for the data part (otel) and not the applicative
| part (alerts, dashboards, etc)
| pranay01 wrote:
| You should also check out SigNoz [1]. It's an open source
| observability platform with metrics, traces and logs in a
| single application & based natively on opentelemetry.
|
| One of the reasons many people use SigNoz is to avoid the
| vendor lock-in which comes with adding proprietary SDKs of
| closed source products like DataDog and New Relic in their
| code.
|
| If anyone is starting their observability journey today, I
| think OpenTelemetry is a very good place to start. You can
| instrument with Otel SDKs and chose a visualization
| layer/backend which suits your needs best
|
| (Disclaimer: I am maintainer at SigNoz)
|
| [1]https://github.com/signoz/signoz
| talboren wrote:
| is things such as alerts/dashboards/proprietary data is
| easily exportable with SigNoz? For instance if I'd like to
| manage them with terraform
| braza wrote:
| In the last decade in all companies that I've worked the biggest
| issue by far in all of them were vendor lock-in and how the teams
| coped around tools that did not evolved with their problems along
| the time and the switch cost was high.
|
| I know that Open Telemetry has it's own issues, but between low
| ergonomics and amenities and a lock-in; nowadays I will choose
| the first.
| talboren wrote:
| Was and still is one of the biggest problems orgz have without
| even knowing they have them. until it hits. I think architects
| should give that thought from a very early stage of a company
| and often don't
| mikeshi42 wrote:
| What's been your biggest issues around ergonomics/amenities for
| OpenTelemetry?
|
| (We've been working on some Otel-based SDKs to try to make it
| easier to onboard [1] as an example, though curious where the
| general sentiment is)
|
| [1] https://www.hyperdx.io/docs/install/javascript
| scottlamb wrote:
| > What's been your biggest issues around ergonomics/amenities
| for OpenTelemetry?
|
| I can't speak generally, but in the Rust ecosystem the
| various crates don't play well together. Here's one example:
| <https://github.com/tokio-rs/tracing/issues/2648> There are
| four crates involved (tracing-attributes, tracing-
| opentelemetry, opentelemetry, and either of
| opentelemetry-{datadog,otlp}) and none of them fit properly
| into any of the others.
| mikeshi42 wrote:
| Oh yeah Rust... definitely a bit more sharp edges there
| which is frustrating (had a friend tell me the same
| recently).
| scottlamb wrote:
| ...yeah.
|
| At least I don't _have_ to use all these things.
| `tracing` itself is essentially mandatory if I want to
| pick up all the spans and events created by the crates I
| depend on. But I can (and maybe will...) write my own
| `tracing::Subscriber`, sidestepping all the various bugs
| and incompatibilities of `tracing-subscriber` and
| `opentelemetry{-,otlp,-datadog}`.
|
| (Here's another fun one in `tracing-subscriber`:
| <https://github.com/tokio-
| rs/tracing/issues/2519#issuecomment...> These interfaces
| just aren't right.)
| mikeshi42 wrote:
| I think it's a huge shame, as Rust is one of the few
| languages that tracing is effectively native in, and it's
| shocking that the defacto standard of tracing transport
| doesn't interop well.
|
| Admittedly, I think part of it is just where Rust is on
| the language adoption curve and where Otel is in its
| project lifecycle as well. Writing your own subscriber
| might not be a terrible idea, we internally end up
| needing to do similar things as we find limitations/bugs
| that we can't wait for upstream to fix (but we're a
| vendor and that makes more sense for us than end users!)
| scottlamb wrote:
| Yes. I love Rust, but its observability story is terrible
| right now. Not just tracing; I was complaining in another
| thread recently that the async ecosystem doesn't have a
| production-ready equivalent of thread dumps.
| <https://news.ycombinator.com/item?id=37792011> I really
| want to see the whole picture improve (and as I'm able,
| participate in improving it).
|
| I'm considering writing a tracing subscriber that dumps
| events, span starts/stops, and span field updates to a
| terse local log file format. This is a superset of what
| OpenTelemetry offers. (OpenTelemetry only has the concept
| of a completed span, which I find really unfortunate.) So
| I'd write a tool that takes that and pushes it to
| OpenTelemetry (otelcol-contrib plugin maybe) and more
| local-focused tools like `logcat`.
| mikeshi42 wrote:
| Oh huh! What would you do with span start independent of
| stops?
|
| A tangent on logcat - local observability to me is a
| really intriguing area, I think there's a story of Otel
| for local as well if someone can build a good enough
| local DX for consuming them (we've been told a number of
| times about this
| https://github.com/hyperdxio/hyperdx/issues/7 as an
| example)
| scottlamb wrote:
| > Oh huh! What would you do with span start independent
| of stops?
|
| * When browsing locally, I'd be able to see spans that
| never closed (because they were super long-running and/or
| because the process crashed mid-span). I suppose for the
| latter case, the otel collector could upload them
| (marking them as incomplete somehow) when it knows the
| process shut down.
|
| * When looking at events(/logs), I'd be able to see the
| current state of all the enclosing spans, including their
| fields. Easiest to do locally, but ideally also in otel.
| Maybe some mechanism for automatically copying select
| attributes from the span to the event for use in otel
| (details tbd, whether it's selected in code at span
| creation time, tweaked in the otel collector config, or
| what).
| spondyl wrote:
| Hmm, as someone who uses and defacto manages Datadog for a
| sizable org in their day job, I'm not sure using that
| OpenTelemetry PR fiasco is an up to date picture.
|
| There was definitely a period where Datadog seemed to be talking
| up OTel but not actually doing anything of note to support that
| ecosystem.
|
| I'd say in the last year or two, they've done a bit of a 180 and
| been embracing it quite a lot.
|
| One major change is that they not only added support for the W3C
| Context header format but actually set it as the default over
| their own proprietary format.
|
| The reason that's a pretty big deal is that W3C Context is set as
| a MUST for OTel clients to implement so it goes a long way to
| making interoperability (and migration) pretty painless.
|
| Prior to that, you could use OTel but the actual distributed
| aspect (spans between services linking up) probably wouldn't work
| as OTel services wouldn't recognise the Datadog header format and
| vice versa.
|
| There are, of course, still some features that you would miss out
| on by using OTel over the Datadog SDKs like I don't believe the
| profiler would work necessarily but that's a tradeoff to be made.
| shahargl wrote:
| Yet again, that's about the data, I think the blog post focuses
| on the applicative aspect
| TheIronYuppie wrote:
| FULLY BIASED COMMENT:
|
| We (the bacalhau.org[0] project) are interested in helping with
| this - one of our philosophies has been that part of the problem
| is with that first step. By first moving to a lake of some kind,
| you end up giving up lots of optionality. Even basic things like
| aggregation, filtering, windowing, etc now need to be in the
| "locked-in" tool, which is exactly the wrong first step to take.
|
| SHOW HN: We have a solution that uses DuckDB to do some of this
| initial work[1] which can save you 70%+ or more on total data
| throughput. Further, it allows you to do interesting things like
| eventing, multi-homing observability data, etc.
|
| I'd be very interested to hear any/all thoughts!
|
| [0] https://github.com/bacalhau-project/bacalhau
|
| [1] https://blog.bacalhau.org/p/bacalhau-x-duckdb-deploying-
| appl...
|
| Disclosure: I co-founded the Bacalhau project.
| RaouleDuke wrote:
| This post speaks to a larger issue that cloud vendors are driven
| to extract as much money from you, the customer, as possible.
| They are not evil or malicious, they are commercial enterprises
| and cloud is built on consumption economics. There is no
| incentive to make it easy for you to move to another cloud.
| Replace 'observability space' with SEIM/SOAR, first party
| databases (Spanner/CosmosDB), many PaaS offerings, and the themes
| still apply. Pushing proprietary solutions to your is an
| effective means of making customers stick. I am not passing
| judgement here, there is some value to turnkey solutions but it
| depends on your business. Datadog in particular is a bit
| insidious as they have a multi-cloud proprietary service that can
| follow your workloads across clouds (even so far as to be
| essentially first party on Azure via Azure Native ISV Services).
___________________________________________________________________
(page generated 2023-10-31 23:01 UTC)