[HN Gopher] Vendor lock-in in the observability space
       ___________________________________________________________________
        
       Vendor lock-in in the observability space
        
       Author : talboren
       Score  : 74 points
       Date   : 2023-10-31 15:14 UTC (7 hours ago)
        
 (HTM) web link (www.keephq.dev)
 (TXT) w3m dump (www.keephq.dev)
        
       | wbeckler wrote:
       | Does Keep have an open core business model, where they host your
       | stuff using a proprietary control plane for a fee and introduce
       | proprietary features around the edges?
        
         | shahargl wrote:
         | Nope, fully open source - https://github.com/keephq/keep
        
       | Volundr wrote:
       | The article seems to suggest https://github.com/open-
       | telemetry/opentelemetry-collector-co... was silently killed, yet
       | it appears to have been merged in January, am I missing
       | something?
        
         | shahargl wrote:
         | You should read this to get the full context -
         | https://news.ycombinator.com/item?id=34540419
        
       | camel_gopher wrote:
       | Moving alerts is easy. Moving people, now that's hard.
        
         | shahargl wrote:
         | Wdym by moving people?
        
         | talboren wrote:
         | Like other things around, there's always the people's problem
         | but when vendor locked, that's yet another big issue to address
        
       | hbcondo714 wrote:
       | I've bounced around Splunk, New Relic, Sentry and Datadog over
       | the years. Most recently, I was working with Java and used the
       | open source Vendor-neutral application observability facade
       | Micrometer[1] to test out and confirm which APM we wanted to go
       | with.
       | 
       | [1] https://micrometer.io
        
         | shahargl wrote:
         | Cool! Never heard of it and definitely gonna check it.
        
         | shahargl wrote:
         | But that's for the data part (otel) and not the applicative
         | part (alerts, dashboards, etc)
        
         | pranay01 wrote:
         | You should also check out SigNoz [1]. It's an open source
         | observability platform with metrics, traces and logs in a
         | single application & based natively on opentelemetry.
         | 
         | One of the reasons many people use SigNoz is to avoid the
         | vendor lock-in which comes with adding proprietary SDKs of
         | closed source products like DataDog and New Relic in their
         | code.
         | 
         | If anyone is starting their observability journey today, I
         | think OpenTelemetry is a very good place to start. You can
         | instrument with Otel SDKs and chose a visualization
         | layer/backend which suits your needs best
         | 
         | (Disclaimer: I am maintainer at SigNoz)
         | 
         | [1]https://github.com/signoz/signoz
        
           | talboren wrote:
           | is things such as alerts/dashboards/proprietary data is
           | easily exportable with SigNoz? For instance if I'd like to
           | manage them with terraform
        
       | braza wrote:
       | In the last decade in all companies that I've worked the biggest
       | issue by far in all of them were vendor lock-in and how the teams
       | coped around tools that did not evolved with their problems along
       | the time and the switch cost was high.
       | 
       | I know that Open Telemetry has it's own issues, but between low
       | ergonomics and amenities and a lock-in; nowadays I will choose
       | the first.
        
         | talboren wrote:
         | Was and still is one of the biggest problems orgz have without
         | even knowing they have them. until it hits. I think architects
         | should give that thought from a very early stage of a company
         | and often don't
        
         | mikeshi42 wrote:
         | What's been your biggest issues around ergonomics/amenities for
         | OpenTelemetry?
         | 
         | (We've been working on some Otel-based SDKs to try to make it
         | easier to onboard [1] as an example, though curious where the
         | general sentiment is)
         | 
         | [1] https://www.hyperdx.io/docs/install/javascript
        
           | scottlamb wrote:
           | > What's been your biggest issues around ergonomics/amenities
           | for OpenTelemetry?
           | 
           | I can't speak generally, but in the Rust ecosystem the
           | various crates don't play well together. Here's one example:
           | <https://github.com/tokio-rs/tracing/issues/2648> There are
           | four crates involved (tracing-attributes, tracing-
           | opentelemetry, opentelemetry, and either of
           | opentelemetry-{datadog,otlp}) and none of them fit properly
           | into any of the others.
        
             | mikeshi42 wrote:
             | Oh yeah Rust... definitely a bit more sharp edges there
             | which is frustrating (had a friend tell me the same
             | recently).
        
               | scottlamb wrote:
               | ...yeah.
               | 
               | At least I don't _have_ to use all these things.
               | `tracing` itself is essentially mandatory if I want to
               | pick up all the spans and events created by the crates I
               | depend on. But I can (and maybe will...) write my own
               | `tracing::Subscriber`, sidestepping all the various bugs
               | and incompatibilities of `tracing-subscriber` and
               | `opentelemetry{-,otlp,-datadog}`.
               | 
               | (Here's another fun one in `tracing-subscriber`:
               | <https://github.com/tokio-
               | rs/tracing/issues/2519#issuecomment...> These interfaces
               | just aren't right.)
        
               | mikeshi42 wrote:
               | I think it's a huge shame, as Rust is one of the few
               | languages that tracing is effectively native in, and it's
               | shocking that the defacto standard of tracing transport
               | doesn't interop well.
               | 
               | Admittedly, I think part of it is just where Rust is on
               | the language adoption curve and where Otel is in its
               | project lifecycle as well. Writing your own subscriber
               | might not be a terrible idea, we internally end up
               | needing to do similar things as we find limitations/bugs
               | that we can't wait for upstream to fix (but we're a
               | vendor and that makes more sense for us than end users!)
        
               | scottlamb wrote:
               | Yes. I love Rust, but its observability story is terrible
               | right now. Not just tracing; I was complaining in another
               | thread recently that the async ecosystem doesn't have a
               | production-ready equivalent of thread dumps.
               | <https://news.ycombinator.com/item?id=37792011> I really
               | want to see the whole picture improve (and as I'm able,
               | participate in improving it).
               | 
               | I'm considering writing a tracing subscriber that dumps
               | events, span starts/stops, and span field updates to a
               | terse local log file format. This is a superset of what
               | OpenTelemetry offers. (OpenTelemetry only has the concept
               | of a completed span, which I find really unfortunate.) So
               | I'd write a tool that takes that and pushes it to
               | OpenTelemetry (otelcol-contrib plugin maybe) and more
               | local-focused tools like `logcat`.
        
               | mikeshi42 wrote:
               | Oh huh! What would you do with span start independent of
               | stops?
               | 
               | A tangent on logcat - local observability to me is a
               | really intriguing area, I think there's a story of Otel
               | for local as well if someone can build a good enough
               | local DX for consuming them (we've been told a number of
               | times about this
               | https://github.com/hyperdxio/hyperdx/issues/7 as an
               | example)
        
               | scottlamb wrote:
               | > Oh huh! What would you do with span start independent
               | of stops?
               | 
               | * When browsing locally, I'd be able to see spans that
               | never closed (because they were super long-running and/or
               | because the process crashed mid-span). I suppose for the
               | latter case, the otel collector could upload them
               | (marking them as incomplete somehow) when it knows the
               | process shut down.
               | 
               | * When looking at events(/logs), I'd be able to see the
               | current state of all the enclosing spans, including their
               | fields. Easiest to do locally, but ideally also in otel.
               | Maybe some mechanism for automatically copying select
               | attributes from the span to the event for use in otel
               | (details tbd, whether it's selected in code at span
               | creation time, tweaked in the otel collector config, or
               | what).
        
       | spondyl wrote:
       | Hmm, as someone who uses and defacto manages Datadog for a
       | sizable org in their day job, I'm not sure using that
       | OpenTelemetry PR fiasco is an up to date picture.
       | 
       | There was definitely a period where Datadog seemed to be talking
       | up OTel but not actually doing anything of note to support that
       | ecosystem.
       | 
       | I'd say in the last year or two, they've done a bit of a 180 and
       | been embracing it quite a lot.
       | 
       | One major change is that they not only added support for the W3C
       | Context header format but actually set it as the default over
       | their own proprietary format.
       | 
       | The reason that's a pretty big deal is that W3C Context is set as
       | a MUST for OTel clients to implement so it goes a long way to
       | making interoperability (and migration) pretty painless.
       | 
       | Prior to that, you could use OTel but the actual distributed
       | aspect (spans between services linking up) probably wouldn't work
       | as OTel services wouldn't recognise the Datadog header format and
       | vice versa.
       | 
       | There are, of course, still some features that you would miss out
       | on by using OTel over the Datadog SDKs like I don't believe the
       | profiler would work necessarily but that's a tradeoff to be made.
        
         | shahargl wrote:
         | Yet again, that's about the data, I think the blog post focuses
         | on the applicative aspect
        
       | TheIronYuppie wrote:
       | FULLY BIASED COMMENT:
       | 
       | We (the bacalhau.org[0] project) are interested in helping with
       | this - one of our philosophies has been that part of the problem
       | is with that first step. By first moving to a lake of some kind,
       | you end up giving up lots of optionality. Even basic things like
       | aggregation, filtering, windowing, etc now need to be in the
       | "locked-in" tool, which is exactly the wrong first step to take.
       | 
       | SHOW HN: We have a solution that uses DuckDB to do some of this
       | initial work[1] which can save you 70%+ or more on total data
       | throughput. Further, it allows you to do interesting things like
       | eventing, multi-homing observability data, etc.
       | 
       | I'd be very interested to hear any/all thoughts!
       | 
       | [0] https://github.com/bacalhau-project/bacalhau
       | 
       | [1] https://blog.bacalhau.org/p/bacalhau-x-duckdb-deploying-
       | appl...
       | 
       | Disclosure: I co-founded the Bacalhau project.
        
       | RaouleDuke wrote:
       | This post speaks to a larger issue that cloud vendors are driven
       | to extract as much money from you, the customer, as possible.
       | They are not evil or malicious, they are commercial enterprises
       | and cloud is built on consumption economics. There is no
       | incentive to make it easy for you to move to another cloud.
       | Replace 'observability space' with SEIM/SOAR, first party
       | databases (Spanner/CosmosDB), many PaaS offerings, and the themes
       | still apply. Pushing proprietary solutions to your is an
       | effective means of making customers stick. I am not passing
       | judgement here, there is some value to turnkey solutions but it
       | depends on your business. Datadog in particular is a bit
       | insidious as they have a multi-cloud proprietary service that can
       | follow your workloads across clouds (even so far as to be
       | essentially first party on Azure via Azure Native ISV Services).
        
       ___________________________________________________________________
       (page generated 2023-10-31 23:01 UTC)