[HN Gopher] Show HN: You don't need to adopt new tools for LLM o...
       ___________________________________________________________________
        
       Show HN: You don't need to adopt new tools for LLM observability
        
       If you've built any web-based app in the last 15 years, you
       probably used something like Datadog, New Relic, Sentry, etc. to
       monitor and trace your app, right?  Why should it be different when
       the app you're building happens to be using LLMs?  So today we're
       open-sourcing OpenLLMetry-JS. It's an open protocol and SDK, based
       on OpenTelemetry, that provides traces and metrics for LLM JS/TS
       applications and can be connected to any of the 15+ tools that
       already support OpenTelemetry. Here's the repo:
       https://github.com/traceloop/openllmetry-js  A few months ago we
       launched the python flavor here
       (https://news.ycombinator.com/item?id=37843907) and we've now built
       a compatible one for Node.js.  Would love to hear your thoughts and
       opinions!  Check it out -  Docs:
       https://www.traceloop.com/docs/openllmetry/getting-started-t...
       Github: https://github.com/traceloop/openllmetry-js
       https://github.com/traceloop/openllmetry
        
       Author : tomerf2
       Score  : 61 points
       Date   : 2024-02-14 15:52 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tomgs wrote:
       | Cool! Two questions:
       | 
       | 1. Where do you see this observability for LLM thing going?
       | What's the end game? Is it like in traditional observability
       | where all formats eventually will converge to one format (which
       | OpenTelemetry is trying to be)? I feel it might be a little bit
       | early to tell, tho
       | 
       | 2. I noticed you do auto-detection of the framework used, like
       | LLamaIndex et al. Except for annotations, is there a deeper
       | connection to the LLM framework used? This is auto-
       | instrumentation, so I assume you do most of the heavy lifting,
       | but should users of this framework expect some cool hidden eggs
       | when they look at their telemetry?
        
         | tomerf2 wrote:
         | Thanks!
         | 
         | 1. Huh, good question. Hopefully there will be convergence. We
         | started discussing with other companies in this domain to
         | support or even switch to OpenTelemetry.
         | 
         | 2. Nothing specific, except for - as you mentioned - being able
         | to see trace of a RAG pipeline automatically.
        
           | tomgs wrote:
           | A quick one :)
           | 
           | While we're on the topic - how does traceloop factor into all
           | of this? What's the connection between the two? I assume the
           | former is the LLM observability platform (Datadog for LLM?)
           | and the latter is your own auto-instrumentation thingie to
           | supplement it?
        
             | nirga wrote:
             | Yes, Traceloop is kind of a Sentry for LLMs
        
               | ssijak wrote:
               | Its priced way high, 500$ for 50k LLM calls? 50k is not
               | much at all.
        
               | nirga wrote:
               | The open source is free for all ofc. Our platform
               | provides capabilities for monitoring and detecting
               | hallucinations hence cost more.
        
       | lmeyerov wrote:
       | Re:python, if we are already doing otel, how would this interop?
       | Eg, if we don't want to break our current imports, and control
       | where the new instrumentation goes
       | 
       | (Fwiw, This is a great direction!)
        
         | nirga wrote:
         | Super easy - you can just use the standalone instrumentations
         | directly -
         | https://www.traceloop.com/docs/openllmetry/tracing/without-s...
        
       | a_wild_dandan wrote:
       | What problem(s) does this solve? I have a ticket in my backlog.
       | Your SDK unlocks the solution. What is that ticket's title? (I'm
       | a bit thick, and need concrete examples for things to click.)
        
         | nirga wrote:
         | Same ticket that gets you to install something like Sentry -
         | you wanna see what's happening in production and get alerted
         | when things go wrong
        
         | hooverd wrote:
         | It's LLM specific OpenTelemetry tracing. What's going on inside
         | your model isn't the focus. It's everything surrounding your
         | model. How many prompts are people submitting? How long does
         | each prompt take? Did certain prompts time out or return an
         | error? What's the P95/P99 latency for your LLM? And so on.
        
       | epistasis wrote:
       | I was looking to see what the actual metrics would be for a
       | completion, to see if this is something of interest to me. So I
       | tried to run the example here:
       | 
       | https://www.traceloop.com/openllmetry
       | 
       | Problem 1 (very minor): it's missing an `import os`
       | 
       | Problem 2: I need an API key.
       | 
       | Problem 3: The link that it tells me to go to for an API key is
       | malformed: https://https//app.traceloop.com/settings/api-keys
       | 
       | Is there a way to see what the output is like without getting an
       | account, and presumably also connecting to an observability
       | platform like Grafana? I already made a venv and installed the
       | package, so I'm not sure if I'm ready for even more steps just to
       | see if this is something that might be useful to me.
        
         | nirga wrote:
         | Thanks for the issues - I'll fix it! :sweat_smile:
         | 
         | Reg. Grafana and others - it's simple, just set the env vars -
         | https://www.traceloop.com/docs/openllmetry/integrations/intr...
        
       | Aqueous wrote:
       | I thought Observability in this context means the ability to
       | introspectively make sense of why the LLM output what it did,
       | which is a difficult problem because the model parameters are
       | effectively an unintelligible morass of numbers. Does this help
       | with that and if so how?
        
         | tracerbulletx wrote:
         | Pretty sure this just structures logs for requests to common
         | 3rd party LLM providers. Which I guess is useful, but it's not
         | some kind of problem unique to LLMs.
        
           | Aqueous wrote:
           | Correct- the summary is misleading marketing. This is just
           | normal system / service observability. What people mean by
           | observability in the LLM context is specific.
        
       | marcklingen wrote:
       | Fully agree - even as a founder of an 'LLM observability
       | company'. Observability does not need to be reinvented to get
       | detailed traces/metrics/logs of the LLM part of an application.
       | 
       | LLM Observability usually means: prompts and completions, which
       | model was used, errors and exceptions (rate limits, network
       | errors), as well as metrics (latency, output speed, time to first
       | token when streaming, USD/token and cost breakdowns). All of this
       | is well suited to be captured in the existing observability
       | stack. OpenLLMetry makes this really easy and interoperable -
       | chapeau.
       | 
       | In my view, observability is not the core value that solutions
       | like Baserun, Athina, LangSmith, Parea, Arize, Langfuse (my
       | project) and many others solve for. Developing a useful LLM
       | application requires iterative workflows and tinkering. That's
       | what these solutions help with and augment.
       | 
       | There are specific problems to building an LLM application such
       | as managing/versioning of prompts, running evaluations, blending
       | multiple different evaluation sources, collecting datasets to
       | test/benchmark an application, helping with fine-tuning models on
       | high-quality production completions, debugging root causes of
       | quality/latency/cost issues, ...
       | 
       | Most solutions either replicate logs (LLM I/O) or traces at
       | first, as they are a necessary starting point to then build
       | solutions for the other workflow problems. As the observability
       | piece gets more standardized over time, I can see how integrating
       | with the standard makes a ton of sense. Always happy to chat
       | about this.
        
         | nirga wrote:
         | Hey Marc :wave:
         | 
         | Would love to see you integrate and adopt this as soon as it
         | makes sense to you. OpenTelemetry is a great and mature piece
         | of technology and we should all be aligning around it now,
         | while it's still easy to do so.
        
       ___________________________________________________________________
       (page generated 2024-02-14 23:01 UTC)