[HN Gopher] Launch HN: Langfuse (YC W23) - OSS Tracing and Workf...
___________________________________________________________________
Launch HN: Langfuse (YC W23) - OSS Tracing and Workflows to Improve
LLM Apps
Hey HN, we are Marc, Clemens, and Max - the founders of Langfuse.
Langfuse leverages traces, evaluations, prompt management, and
metrics to help developers debug and improve LLM applications. Here
is a full walkthrough: https://www.youtube.com/watch?v=2E8iTvGo9Hs
With Langfuse, you can instrument your app and start ingesting
traces, thereby tracking LLM calls and other relevant logic in your
app such as retrieval, embedding, or agent actions. Langfuse then
helps to analyze traces and use features such as evaluations or
prompt management to make improvements to your app. You can sign
up to try Langfuse Cloud (https://cloud.langfuse.com/ - we have a
generous free tier) or self-host Langfuse
(https://langfuse.com/self-hosting) within a couple of minutes. In
the 15 months since our "Show HN"
(https://news.ycombinator.com/item?id=37310070), thousands of teams
adopted the project (including teams like KhanAcademy, Twilio, and
Samsara) and we hit all of the scaling limits that we anticipated
in the original Show HN thread. On our v1/v2 setup, we frequently
exhausted IOPS on Postgres and had our Node.js container grind to a
halt during tokenizations. Since then, we migrated our Cloud
infrastructure from Vercel/Supabase to Porter and then to AWS &
Clickhouse. Last week, we put the finishing touches on the Langfuse
v3.0.0 release
(https://github.com/langfuse/langfuse/releases/tag/v3.0.0) that
unlocks major scalability improvements we have made over the past
half year and are happy to share with the OSS ecosystem today.
Langfuse v3 addresses three challenges we encountered as an LLM
observability platform: a) handling high ingestion throughput with
large events (long strings, multimodal images/audio/video), b)
providing fast analytical, table, and single-item reads across the
product, and c) serving prompts quickly and reliably in the
critical path of user's applications. Langfuse is used by thousands
of active self-hosting deployments, so at every point we needed to
prioritize stability, fully automated migrations/upgrades, and use
of infrastructure components that self-hosters can deploy freely on
any cloud vendor. The v3 release adds powerful infrastructure with
a Clickhouse database next to Postgres, blob storage for events and
introduces a worker as well as queues and caches (Redis) for data
ingestion. The Langfuse SDKs were originally written to send
updates to a single trace to our backend. The backend then upserts
tracing data in Postgres. Dealing with these updates to guarantee
backwards compatibility with older SDK versions was a challenge.
Our ingestion pipeline writes all events into S3 and sends a
reference to the file via Redis to our worker container. From
there, we read all events with the same id (including all
previously ingested ones) and merge them into a final event. We
insert the new row into ClickHouse which automatically replaces the
existing data for the same ID. Re-merging all event updates enables
us to keep a high-throughput pipeline by converting updates into
new insert-only records. We ran many iterations to optimize our
sorting keys in ClickHouse, use skip indexes efficiently, and
rewrote almost all of our queries and API endpoints to make optimal
use of the schema. Using a specialized, analytical database
required a more database-centric application design than a swiss-
army-knife database like Postgres. The new infrastructure delivers
dramatic performance gains: dashboards now respond within 400ms
(95th percentile) instead of timing out on large projects and
lookback windows, and tables load up to 90% faster - displaying
data within 800ms even for the largest projects. Finally, to serve
prompts from prompt management with low-latency and high
availability, we use caches heavily and also decoupled our
infrastructure. For sensitive paths, we use dedicated deployments
to avoid "noisy neighbors" within the same server. We also improved
client-side caching in our SDKs. This enhancement allows them to
prefetch prompts and revalidate them in the background, resulting
in zero latency when retrieving a prompt at runtime. If you have
any questions or feedback, please join us in this HN thread, or in
future on our Discord and GitHub Discussions. While Langfuse v3 is
scalable, we tried hard to make it easy to get started with
Langfuse and self-host it in your own infrastructure
(https://langfuse.com/self-hosting). PS: Here
(https://langfuse.com/blog/2024-12-langfuse-v3-infrastructure...)
is a more in-depth blog post on how we built Langfuse V3. PPS: if
you find these problems exciting, we are hiring
(https://langfuse.com/join-us) in Berlin!
Author : mdeichmann
Score : 148 points
Date : 2024-12-17 13:43 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mritchie712 wrote:
| In this example: from langfuse.openai import
| openai # OpenAI integration
|
| Why do I need to import openai from langfuse?
| marcklingen wrote:
| This is an optional instrumentation of the OpenAI SDK which
| simplifies getting started, tracking token counts, model
| parameters and streaming latencies.
|
| Langfuse is not in the critical path, this just helps with
| instrumentation.
|
| You can use the Langfuse Python SDK / Decorator to track any
| LLM (with some instrumentation code) or use one of the
| framework integrations.
|
| Here is a fully-featured example using the Amazon Bedrock SDK:
| https://langfuse.com/docs/integrations/amazon-bedrock
| priompteng wrote:
| Nice work but, Sorry but I don't feel comfortable either
| proxying my llm calls through a 3rd party unless the 3rd party
| is a llm gateway like litellm or arch or storing my prompts in
| a SaaS. For tracing, I use OTEL libraries which is more than
| sufficient for my use case.
| marcklingen wrote:
| If you use an OSS Gateway already, some (e.g. LiteLLM) can
| natively forward logs to Langfuse:
| https://docs.litellm.ai/docs/proxy/logging#langfuse
|
| We are looking into adding an Otel Collector as OTel-
| semantics are maturing around LLMs. For now many features
| that are key to LLMOPs are difficult to make work with OTel
| instrumentation as the space is moving quickly. Main thread
| on this is here:
| https://github.com/orgs/langfuse/discussions/2509
| mfdupuis wrote:
| This is actually one of the more interesting LLM observability
| platforms I've seen. Beyond addressing scaling issues, where do
| you see yourself going next?
| mathiasn wrote:
| What are other potential platforms?
| marcklingen wrote:
| This is a good long-list of projects, although it is not
| narrowly scoped to tracing/evals/prompt-management:
| https://github.com/tensorchord/Awesome-LLMOps?tab=readme-
| ov-...
| calebkaiser wrote:
| I'm a maintainer of Opik, an open source LLM evaluation and
| observability platform. We only launched a few months ago,
| but we're growing rapidly: https://github.com/comet-ml/opik
| suninsight wrote:
| Bunch of them : Langsmith, Lunary, Phoenix Arize, Portkey,
| Datadog and Helicone.
|
| We also picked Langfuse - more details here:
| https://www.nonbios.ai/post/the-nonbios-llm-observability-
| pi...
| unnikrishnan_r wrote:
| Thanks, this post was insightful. I laughed at the reason
| why you rejected Arize Phoenix, I had similar thoughts
| while going through their site!=)
|
| > "Another notable feature of Langfuse is the use of a
| model as a judge ... this is not enabled in the free
| version/self-hosted version"
|
| I think you can add LLM-as-judge to the self-hosted version
| of Langfuse by defining your own evaluation pipeline:
| https://langfuse.com/docs/scores/external-evaluation-
| pipelin...
| barefeg wrote:
| Thanks for sharing your blogpost. We had a similar journey.
| I installed and tried both Langfuse and Phoenix and ended
| up choosing Langfuse due to some versioning conflicts on
| the python dependency. I'm curious if your thoughts change
| after V3? I also liked that it only depended on Postgres
| but the scalable version requires other dependencies.
|
| The thing I liked about Phoenix is that it uses
| OpenTelemetry. In the end we're building our Agents SDK in
| a way that the observability platform can be swapped (https
| ://github.com/zetaalphavector/platform/tree/master/agen...)
| and the abstraction is OpenTelemetry-inspired.
| marcklingen wrote:
| As you mentioned, this was a significant trade-off. We
| faced two choices:
|
| (1) Stick with a single Docker container and Postgres.
| This option is simple to self-host, operate, and iterate
| on, but it suffers from poor performance at scale,
| especially for analytical queries that become crucial as
| the project grows. Additionally, as more features
| emerged, we needed a queue and benefited from caching and
| asynchronous processing, which required splitting into a
| second container and adding Redis. These features would
| have been blocked when going for this setup.
|
| (2) Switch to a scalable setup with a robust
| infrastructure that enables us to develop features that
| interest the majority of our community. We have chosen
| this path and prioritized templates and Helm charts to
| simplify self-hosting. Please let us know if you have any
| questions or feedback as we transition to v3. We aim to
| make this process as easy as possible.
|
| Regarding OTel, we are considering adding a collector to
| Langfuse as the OTel semantics are currently developing
| well. The needs of the Langfuse community are evolving
| rapidly, and starting with our own instrumentation has
| allowed us to move quickly while the semantic conventions
| were not developed. We are tracking this here and would
| greatly appreciate your feedback, upvotes, or any
| comments you have on this thread:
| https://github.com/orgs/langfuse/discussions/2509
| skull8888888 wrote:
| We launched Laminar couple of months ago,
| https://www.lmnr.ai. Extremely fast, great DX and written
| in Rust. Definitely worth a look.
| marcklingen wrote:
| Congrats on the Launch!
| skull8888888 wrote:
| thanks Marc :)
| skull8888888 wrote:
| apologies for hijacking your launch (congrats btw!)
| resiros wrote:
| One missing in the list below is Agenta
| (https://github.com/agenta-ai/agenta).
|
| We're oss, otel compliant with stronger focus on evals and
| the enabling collaboration between subject matter experts and
| devs.
| marcklingen wrote:
| Positioning/roadmap differs between the different project in
| the space.
|
| We summarized what we strongly believe in here:
| https://langfuse.com/why Tldr: open apis, self-hostable,
| LLM/cloud/model/framework-agnostic, API first, unopinionated
| building blocks for sophisticated teams, simple yet scalable
| instrumentation that is incrementally adoptable
|
| Regarding roadmap, this is the near-term view:
| https://langfuse.com/roadmap
|
| We work closely with the community, and the roadmap can change
| frequently based on feedback. GitHub Discussions is very
| active, so feel free to join the conversation if you want to
| suggest or contribute a feature: https://langfuse.com/ideas
| matthewolfe wrote:
| Great work, guys!
| extr wrote:
| Very timely post/update, was just checking out your product. IMO
| it is one of the best solutions I've looked at. Appreciate your
| dedication to self hosting, for us it's not really practical to
| have traces with potentially sensitive customer data sitting
| around on some external company's server somewhere (no offense).
| marcklingen wrote:
| Thank you for the kind words! Let us know if you have any
| questions or feedback regarding the self-hosting documentation
| and experience. We collaborate with many teams that have
| diverse security needs, including HIPAA, PCI, and on-premises
| deployments on bare metal without internet access.
| tucnak wrote:
| > YC > OSS
|
| Nice try
| jondwillis wrote:
| I promise this isn't astroturfing ;)
|
| I happened to have been triaging LLM observability, dataset, and
| eval solutions yesterday at the day job, and congratulations,
| Langfuse was the second solution that I tried, and simple enough
| to get set up locally with my existing stack for me to stop
| looking (ye olde time constraints, and I know good-enough when I
| see it!)
|
| Thanks for your and your team's work.
| clemo_ra wrote:
| thank you, that is genuinely nice to hear and motivating for
| our team.
|
| we're available if you ever run into any issues (github, email
| etc.)
| ddtaylor wrote:
| You guys just saved me a lot of trouble. Amazing work everyone
| wow.
| lvkleist wrote:
| Have been a very happy Langfuse user since March - dead simple to
| use and has helped us a lot with LLM observability and debugging
| - great work guys :))
| marcklingen wrote:
| thank you! if you have any ideas for improvements after having
| used Langfuse for a while, please contribute them via github
| discussions: https://langfuse.com/ideas
| kappamax wrote:
| Congrats Marc! We've been using Langfuse for about 6-months for
| our LLMOps tooling. While its SDKs are limited to python and
| typescript, their openapi specification is pretty easy to
| implement in any language.
|
| The team behind it is amazing, and their product being OSS is one
| of the reasons we chose it. But it just keeps getting better.
|
| We're incidentally only using part of the product because we've
| implemented most of these new features, prompt caching, execution
| etc in our app. But with the API you can decide what parts are
| core to your business logic and outsource the parts you don't
| want to deal with to Langfuse.
|
| I appreciate that its not an opionated product.
| marcklingen wrote:
| Thanks for the feedback.
|
| Being unopinionated and API-first has been a core design
| decision. We want to build the building blocks that everyone
| needs while acknowledging that most Langfuse users are very
| sophisticated teams that have a clear idea of what they want to
| achieve. Over time we will build more abstractions for common
| workflows to make it easier to get started but new features
| will always start API-first.
|
| More on this here: https://langfuse.com/why
| tmshapland wrote:
| Seems like Langfuse is becoming the standard. Whenever I talk to
| other builders, they're using Langfuse.
| mdeichmann wrote:
| Thank you! If these builders have some feedback to share, ask
| them to reach out to us :)
| sebselassie wrote:
| great product, so easy to use. love it.
| aantti wrote:
| great product & great team, kudos & congrats! :)
| david1542 wrote:
| Looks awesome! Been using for over a year now and it's a great
| product :) The new improvements seems exciting.
| TripleChecker wrote:
| Looks cool! I'd love to see a simple embedding/sharing tool for
| an LLM playground to share with the non-tech team so they can try
| it. Is that something Langfuse could do?
|
| Also, some typos you want to review on the site:
| https://triplechecker.com/s/655511/langfuse.com
| arjunram77 wrote:
| Congratulations @Marc. Been using this product for 5ish months,
| love the iteration and how the team reacts to feedback. The
| prompt versioning has been immensely valuable!
| marcklingen wrote:
| Thanks AJ, feedback on GitHub/Discord (like yours) has been
| very helpful to evolve prompt management from a quick addition
| of the core platform to one of the most-used features -- for
| which we then actually needed to change a lot of infrastructure
| to make it reliable and fast (see blog post linked in the
| original post)
| swyx wrote:
| (unsolicited review) we've been happy adopters of LangFuse at
| AINews (https://smol.ai/news). ive been tracking the llm ops
| landscape (https://www.latent.space/p/braintrust) for a while and
| its very nice to have an open source solution that is so
| comprehensive and intuitive!
|
| reflections/thoughts on where this field goes next:
|
| 1. i wonder if there are new ops solutions for the realtime apis
| popping up
|
| 2. retries for instructor like structured outputs mess up the
| traces, i wonder if they can be tracked and collapsible
|
| 3. chatgpt canvas like "drafting" workflows are on the rise
| (https://www.latent.space/p/inference-fast-and-slow) and again
| its noisy to see in a chat flow
|
| 4. how often do people actually use the feedback tagging and then
| subsequently finetuning? i always feel guilty that i dont do it
| yet and wonder when and where i should.
| marcklingen wrote:
| appreciate your constructive feedback!
|
| > i wonder if there are new ops solutions for the realtime apis
| popping up
|
| This is something we have spent quite some time on already,
| both on designs internally and talking to teams using Langfuse
| with realtime applications. IMO the usage patterns are still
| developing and the data capturing/visualization needs across
| teams is not aligned. What matters: (1) capture streams, (2)
| for non-text provide timestamped transcript/labels, (3) capture
| the difference between user-time and api-level-time (e.g. when
| catching up on a stream after having categorized the input
| first).
|
| We are excited to build support for this, if you or others have
| ideas or a wishlist, please add them to this thread:
| https://github.com/orgs/langfuse/discussions/4757
|
| > retries for instructor like structured outputs mess up the
| traces, i wonder if they can be tracked and collapsible
|
| Great feedback. Being able to retroactively downrank llm calls
| to be `debug` level in order to collapse/hide them by default
| would be interesting. Added thread for this here:
| https://github.com/orgs/langfuse/discussions/4758
|
| > chatgpt canvas like "drafting" workflows are on the rise
| (https://www.latent.space/p/inference-fast-and-slow) and again
| its noisy to see in a chat flow
|
| Can you share an example trace for this or open a thread on
| github? Would love to understand this in more detail as I have
| seen different trace-representations of it -- the best yet was
| a _git diff_ on a wrapper span for every iteration.
|
| > how often do people actually use the feedback tagging and
| then subsequently finetuning? i always feel guilty that i dont
| do it yet and wonder when and where i should.
|
| Have not seen finetuning based on user-feedback a lot as the
| feedback can be noisy and low in frequency (unless there is a
| very clear feedback loop built into the product). More common
| workflow that I have seen: identify new problems via user
| feedback -> review them manually -> create llm-as-a-judge or
| other automated evals for this problem -> select "good"
| examples for fine-tuning based on a mix of different evals that
| currently run on production data -> sanitize the dataset (e.g.
| remove PII).
|
| Finetuning has been more popular for structured output, sql
| generation (clear feedback loop / retries at run-time if the
| output does not work). More teams fine-tune on all the output
| that has passed this initial run-time gate for model
| distillation without further quality controls on the training
| dataset. They usually then run evals on a test dataset in order
| to verify whether the fine-tuned hits their quality bar.
| robrenaud wrote:
| I've been using self hosted langfuse via litellm in a juptyer
| notebook for a few weeks for some synthetic data experiments.
| It's been a nice/useful tool.
|
| I've liked having the traces and scores in a unified browser
| based UI, it made sanity checking experiments way easier than
| doing the same thing inside the notebook.
|
| The trace/generation retrieval API was brutally slow for bulk
| scanning operations, so I bypassed it and just queried the db
| directly. But the is the beauty of open source/self hosted code.
| marcklingen wrote:
| Thanks for the feedback, glad that you find Langfuse useful!
|
| Can you create an issue with more details on the API
| performance problems? We monitor strict SLOs on the public API
| for Langfuse Cloud and are not aware of any ongoing issues,
| would love to learn more.
| bewestphal wrote:
| Congrats on the launch :) happy users @ Samsara.
|
| Key to our LLM customer feedback flywheel and dataset building.
| marcklingen wrote:
| Thank you! Working with your team has been great. I love seeing
| you ship LLM-powered features and appreciate the feedback you
| have shared along the way.
| krb0 wrote:
| Great work! Easy to integrate :)
| lunarcave wrote:
| A happy Langfuse customer here!
|
| We've been building an agent platform and some of our customers
| wanted someway to exfil OTEL traces to their own setup. Initially
| we tried building our own but then realised Languse does exactly
| what we needed doing. So we offered it as a first class
| integration, (and started using it internally).
|
| Great product, and hope you guys continue to improve it!
| marcklingen wrote:
| Thanks! Really enjoyed working with you maintainers of other
| projects to help them offer more native LLM observability and
| evaluation to their users/communities. There is a lot that goes
| into making the observability/eval part scalable/useful and
| requirements change on a weekly basis with new advancements.
| Same applies to other projects and it makes a lot of sense to
| integrate.
|
| Overview of community integrations:
| https://langfuse.com/docs/integrations/overview
|
| Packages that depend on Langfuse:
| https://langfuse.com/faq/all/packages-depending-on-langfuse
| punkpeye wrote:
| Been using it. Happy customer. It gave me sanity into otherwise
| very complex LLM infrastructure. We spend 60k+ every month on LLM
| calls, so having the backbone to debug when things go haywire has
| helped a lot.
___________________________________________________________________
(page generated 2024-12-17 23:00 UTC)