[HN Gopher] Show HN: Burr - A framework for building and debuggi...
___________________________________________________________________
Show HN: Burr - A framework for building and debugging GenAI apps
faster
Hey HN, we're developing Burr (github.com/dagworks-inc/burr), an
open-source python framework that makes it easier to build and
debug GenAI applications. Burr is a lightweight library that can
integrate with your favorite tools and comes with a debugging UI.
If you prefer a video introduction, you can watch me build a
chatbot here: https://www.youtube.com/watch?v=rEZ4oDN0GdU. Common
friction points we've seen with GenAI applications include
logically modeling application flow, debugging and recreating error
cases, and curating data for testing/evaluation (see
https://hamel.dev/blog/posts/evals/). Burr aims to make these
easier. You can run Burr locally - see instructions in the repo.
We talked to many companies about the pains they felt in building
applications on top of LLMs and were surprised how many built
bespoke state management layers and used printlines to debug. We
found that everyone wanted the ability to pull up the state of an
application at a given point, poke at it to debug/tweak code, and
use for later testing/evaluation. People integrating with LLMOps
tools fared slightly better, but these tend to focus solely on API
calls to test & evaluate prompts, and left the problem of logically
modeling/checkpointing unsolved. Having platform tooling
backgrounds, we felt that a good abstraction would help improve the
experience. These problems all got easier to think about when we
modeled applications a state machines composed of "actions"
designed for introspection (for more read
https://blog.dagworks.io/p/burr-develop-stateful-ai-applicat...).
We don't want to limit what people can write, but we do want to
constrain it just enough that the framework provides value and
doesn't get in the way. This led us to design Burr with the
following core functionalities: 1. BYOF. Burr allows you to bring
your own frameworks/delegate to any python code, like LangChain,
LlamaIndex, Hamilton, etc. inside of "actions". This provides you
with the flexibility to mix and match so you're not limited. 2.
Pluggability. Burr comes with APIs to allow you to save/load (i.e.
checkpoint) application state, run custom code before/after action
execution, and add in your own telemetry provider (e.g. langfuse,
datadog, DAGWorks, etc.). 3. UI. Burr comes with its own UI
(following the python batteries included ethos) that you can run
locally, with the intent to connect with your development/debugging
workflow. You can see your application as it progresses and inspect
its state at any given point. The above functionalities lend
themselves well to building many types of applications quickly and
flexibly using the tools you want. E.g. conversational RAG bots,
text based games, human in the loop workflows, text to SQL bots,
etc. Start with LangChain and then easily transition to your custom
code or another framework without having to rewrite much of your
application. Side note: we also see Burr as useful outside of
interactive GenAI/LLMs applications, e.g. building hyper-parameter
optimization routines for chunking and embeddings & orchestrating
simulations. We have a swath of improvements planned. We would
love feedback, contributions, & help prioritizing. Typescript
support, more ergonomic UX + APIs for annotation and test/eval
curation, as well as integrations with common telemetry frameworks
and capture of finer grained information from frameworks like
LangChain, LlamaIndex, Hamilton, etc... Re: the name Burr, you may
recognize us as the authors of Hamilton (github.com/dagworks-
inc/hamilton), named after Alexander Hamilton (the creator of the
federal reserve). While Aaron Burr killed him in a duel, we see
Burr being a complement, rather than killer to Hamilton ! That's
all for now. Please don't hesitate to open github
issues/discussions or join our discord
https://discord.gg/6Zy2DwP4f3 to chat with us there. We're still
very early and would love to get your feedback!
Author : elijahbenizzy
Score : 61 points
Date : 2024-04-03 13:42 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jondwillis wrote:
| Any plans for a JS/TS companion, like LangGraph has?
| elijahbenizzy wrote:
| Yep! That's the plan. We're actively looking for partners who
| can test it out while we're building/bring us use-cases to
| noodle on, so feel free to reach out if you're interested.
| exe34 wrote:
| Had a quick look and couldn't find anything obvious - is it easy
| to point to a llama.cpp server on a different machine pretending
| to be openai?
| elijahbenizzy wrote:
| So yeah, shouldn't be too hard, but Burr does not replace LLM
| API calls -- you can bring whatever frameworks you want. You
| could do this by writing a simple function:
| @action(reads=["prompt"], writes=["response"]) def
| my_action(state: State) -> Tuple[dict, State]:
| prompt = state["prompt"] # you stick in whatever
| library you want for the call llm_response =
| whatever_library(prompt) state =
| state.update(response=llm_response) return
| {"response" : llm_response},
| state.update(response=llm_response)
|
| Does this help?
| exe34 wrote:
| Makes sense, thanks!
| zainhoda wrote:
| Looks really interesting! I saw something in the code about
| streaming. Could you explain that a bit more?
| elijahbenizzy wrote:
| Yep! There's a streaming API. It's a little more technically
| involved, but you can stream back at the point at which an
| application "halts", meaning you pause and give control back to
| the user. It only updates state _after_ the stream completes.
|
| Technical details follow:
|
| You can define an action like this:
| @streaming_action(reads=["prompt"], writes=["prompt"])
| def streaming_chat_call(state: State, **run_kwargs) ->
| Generator[dict, None, Tuple[dict, State]]: client =
| openai.Client() response =
| client.chat.completions.create( ...,
| stream=True, ) buffer = []
| for chunk in response: delta =
| chunk.choices[0].delta.content
| buffer.append(delta) yield {'response': delta}
| full_response = ''.join(buffer) return {'response':
| full_response}, state.append(response=full_response)
|
| Then you would call `application.stream_result()` function,
| which would give you back a container object that you can
| stream to the user:
| streaming_result_container = application.stream_result(...)
| action_we_just_ran = streaming_result_container.get()
| print(f"getting streaming results for
| action={action_we_just_ran.name}") for
| result_component in streaming_result_container:
| print(result_component['response']) # this assumes you have a
| response key in your result # get the final result
| final_state, final_result = streaming_result_container.get()
|
| Its nice in a web-server or a streamlit app where you can use
| streaming responses to connect to the frontend. Here's how we
| use it in a streamlit app -- we plan for a streaming web server
| soon: https://github.com/DAGWorks-
| Inc/burr/blob/main/examples/stre....
| juliantcchu wrote:
| Have been using it the past few weeks and loving it so far.
| Highly recommend
| krawczstef wrote:
| thanks Julian - your feedback has been super helpful!
| talos_ wrote:
| There's a LOT of LLM frameworks out there. The "bring your own
| framework" adds flexibility instead of endless integrations (like
| in ML libraries and MLOps tooling).
|
| If I'm getting started with LLMs or have 1-2 POC deployed at my
| company what's the benefit of adding burr to my stack?
| elijahbenizzy wrote:
| So I'm a big fan of limiting complexity -- you generally don't
| want to add something to your stack unless you get value from
| it.
|
| If you're getting started with LLMs, the first step is an API
| call to OpenAI (or some other foundational model) to get a feel
| for it, then think about how you want to integrate with your
| application. Burr can help standardize the structure, allowing
| you to think less about it at any given point (as logic is
| encapsulated into actions). Furthermore, the UI can help you
| debug.
|
| With a few POCs at your company (assuming you want to iterate
| and get into production), Burr can help abstract some complex
| parts away (state management and telemetry), and make your code
| cleaner and more extensible, which can help you iterate quickly
| and explain what you're doing (you can always draw a picture of
| your app).
|
| We think that it buys companies something they really want --
| the ability to swap out implementations/vendors as it decouples
| the logic from application concerns.
|
| So, if you have logic that's starting to get complicated (and
| might get more-so), I think Burr is good. If all you're doing
| is a single wrapper over a GPT-call, it might be overkill (or
| not! things tend to grow in complexity).
| talos_ wrote:
| That makes sense. I'll look into it more!
|
| One project was to create a layer between external APIs
| (OpenAI, Gemini, Claude) and our team. This way we can manage
| the cost of API calls, try different providers, and log API
| usage to find out what works / what doesn't.
|
| How does burr scale with an application and support multiple
| users?
| elijahbenizzy wrote:
| Cool! Yeah I think that would pair nicely. You can have the
| abstraction at the action level, or push it down into the
| stack (as it seems you're doing). Have also seen some
| startups building proxy layers for OpenAI/GPT.
|
| It's just a library so it depends on what could be the
| bottlenecks. Standard ones are: 1. Compute -- just scale
| horizontally 2. DB -- scale horizontally (and vertically),
| use intelligent partitioning
|
| (1) is generally easy with autoscaling, (2) is where your
| queries/data modeling matter, and this is why we made
| persistence entirely pluggable, figuring we wouldn't know
| the best usage patterns yet. More reading:
| https://burr.dagworks.io/concepts/state-persistence/.
|
| One more thing to think about -- Burr isn't opinionated
| about how you build your app -- E.G. up to you to think
| about what happens if you have your webapp open in, say,
| two tabs. You'll need to manage situations like this, but
| Burr's plugin capabilities should make that easier to
| iterate on. We're looking to add more examples on this --
| also reach out if you have any specific questions/find
| anything missing/have suggestions.
| pama wrote:
| Thanks for these efforts. I am surprised that you "... were
| surprised how many built bespoke state management layers and used
| printlines to debug." Print is the common debug tool since the
| start of CS, and bespoke state management layers, though not
| ideal in a hypothetical world, involve less transition effort
| than a commitment to a new framework not invented here that
| eventually has to change. Maybe you meant something different?
| elijahbenizzy wrote:
| Sure, yes, not going to lie I use printlines all the time.
| There's nothing inherently wrong with it, and meant no offense!
| In fact, you can even add a hook with Burr that adds as many
| printlines as you like :)
| https://burr.dagworks.io/concepts/hooks/. Meant no offense.
|
| That said, logging stuff and using regexes (which is the common
| way to productionize printlines for live/post-hoc debugging)
| has a lot of challenges, and this is where we've seen a
| framework become useful.
|
| There's an obvious trade-off between adopting a framework and
| rolling your own -- we've seen both work in various contexts
| (we've been in MLOps for quite a while). Our hope is that this
| will make people more productive by constraining the space they
| need to think about and allow more mental room for application
| logic.
| avereveard wrote:
| Right? People forget the roots. Nowasays logs are super
| complicated things, but in unix and early Linux days,
| everything was a file descriptor and it happened to be that 0
| was the default one to write to and by convention could be
| routed by pipes and streams into whatever making it super
| convenient way to emit informations not to users, but to
| anything
|
| Which is the right thing anyway. If your target are interactive
| users a log is a terrible thing to do as they won't know where
| it ends up. And if the target is not interactive users you
| should have error codes not log messages.
|
| I think that is partly because today is hard to just attach a
| debugger to things and see the state of a program so people try
| to serialise the program state to english strings with a
| timestamp and work backwards from that to understand what the
| program was doing, but then what we need is a differentiable
| single stack core dump not an english approximation of it.
|
| And especially not one that gets stolen by systemd and hidden
| away.
|
| Sorry for the rant but just printing infos and core dumping
| errors was the best thing ever and it was taken away from me.
___________________________________________________________________
(page generated 2024-04-03 23:01 UTC)