[HN Gopher] Show HN: Burr - A framework for building and debuggi...
       ___________________________________________________________________
        
       Show HN: Burr - A framework for building and debugging GenAI apps
       faster
        
       Hey HN, we're developing Burr (github.com/dagworks-inc/burr), an
       open-source python framework that makes it easier to build and
       debug GenAI applications.  Burr is a lightweight library that can
       integrate with your favorite tools and comes with a debugging UI.
       If you prefer a video introduction, you can watch me build a
       chatbot here: https://www.youtube.com/watch?v=rEZ4oDN0GdU.  Common
       friction points we've seen with GenAI applications include
       logically modeling application flow, debugging and recreating error
       cases, and curating data for testing/evaluation (see
       https://hamel.dev/blog/posts/evals/). Burr aims to make these
       easier. You can run Burr locally - see instructions in the repo.
       We talked to many companies about the pains they felt in building
       applications on top of LLMs and were surprised how many built
       bespoke state management layers and used printlines to debug.  We
       found that everyone wanted the ability to pull up the state of an
       application at a given point, poke at it to debug/tweak code, and
       use for later testing/evaluation. People integrating with LLMOps
       tools fared slightly better, but these tend to focus solely on API
       calls to test & evaluate prompts, and left the problem of logically
       modeling/checkpointing unsolved.  Having platform tooling
       backgrounds, we felt that a good abstraction would help improve the
       experience. These problems all got easier to think about when we
       modeled applications a state machines composed of "actions"
       designed for introspection (for more read
       https://blog.dagworks.io/p/burr-develop-stateful-ai-applicat...).
       We don't want to limit what people can write, but we do want to
       constrain it just enough that the framework provides value and
       doesn't get in the way. This led us to design Burr with the
       following core functionalities:  1. BYOF. Burr allows you to bring
       your own frameworks/delegate to any python code, like LangChain,
       LlamaIndex, Hamilton, etc. inside of "actions". This provides you
       with the flexibility to mix and match so you're not limited.  2.
       Pluggability. Burr comes with APIs to allow you to save/load (i.e.
       checkpoint) application state, run custom code before/after action
       execution, and add in your own telemetry provider (e.g. langfuse,
       datadog, DAGWorks, etc.).  3. UI. Burr comes with its own UI
       (following the python batteries included ethos) that you can run
       locally, with the intent to connect with your development/debugging
       workflow. You can see your application as it progresses and inspect
       its state at any given point.  The above functionalities lend
       themselves well to building many types of applications quickly and
       flexibly using the tools you want. E.g. conversational RAG bots,
       text based games, human in the loop workflows, text to SQL bots,
       etc. Start with LangChain and then easily transition to your custom
       code or another framework without having to rewrite much of your
       application. Side note: we also see Burr as useful outside of
       interactive GenAI/LLMs applications, e.g. building hyper-parameter
       optimization routines for chunking and embeddings & orchestrating
       simulations.  We have a swath of improvements planned. We would
       love feedback, contributions, & help prioritizing. Typescript
       support, more ergonomic UX + APIs for annotation and test/eval
       curation, as well as integrations with common telemetry frameworks
       and capture of finer grained information from frameworks like
       LangChain, LlamaIndex, Hamilton, etc...  Re: the name Burr, you may
       recognize us as the authors of Hamilton (github.com/dagworks-
       inc/hamilton), named after Alexander Hamilton (the creator of the
       federal reserve). While Aaron Burr killed him in a duel, we see
       Burr being a complement, rather than killer to Hamilton !  That's
       all for now. Please don't hesitate to open github
       issues/discussions or join our discord
       https://discord.gg/6Zy2DwP4f3 to chat with us there. We're still
       very early and would love to get your feedback!
        
       Author : elijahbenizzy
       Score  : 61 points
       Date   : 2024-04-03 13:42 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jondwillis wrote:
       | Any plans for a JS/TS companion, like LangGraph has?
        
         | elijahbenizzy wrote:
         | Yep! That's the plan. We're actively looking for partners who
         | can test it out while we're building/bring us use-cases to
         | noodle on, so feel free to reach out if you're interested.
        
       | exe34 wrote:
       | Had a quick look and couldn't find anything obvious - is it easy
       | to point to a llama.cpp server on a different machine pretending
       | to be openai?
        
         | elijahbenizzy wrote:
         | So yeah, shouldn't be too hard, but Burr does not replace LLM
         | API calls -- you can bring whatever frameworks you want. You
         | could do this by writing a simple function:
         | @action(reads=["prompt"], writes=["response"])         def
         | my_action(state: State) -> Tuple[dict, State]:
         | prompt = state["prompt"]            # you stick in whatever
         | library you want for the call             llm_response =
         | whatever_library(prompt)            state =
         | state.update(response=llm_response)            return
         | {"response" : llm_response},
         | state.update(response=llm_response)
         | 
         | Does this help?
        
           | exe34 wrote:
           | Makes sense, thanks!
        
       | zainhoda wrote:
       | Looks really interesting! I saw something in the code about
       | streaming. Could you explain that a bit more?
        
         | elijahbenizzy wrote:
         | Yep! There's a streaming API. It's a little more technically
         | involved, but you can stream back at the point at which an
         | application "halts", meaning you pause and give control back to
         | the user. It only updates state _after_ the stream completes.
         | 
         | Technical details follow:
         | 
         | You can define an action like this:
         | @streaming_action(reads=["prompt"], writes=["prompt"])
         | def streaming_chat_call(state: State, **run_kwargs) ->
         | Generator[dict, None, Tuple[dict, State]]:             client =
         | openai.Client()             response =
         | client.chat.completions.create(                 ...,
         | stream=True,             )             buffer = []
         | for chunk in response:                 delta =
         | chunk.choices[0].delta.content
         | buffer.append(delta)                 yield {'response': delta}
         | full_response = ''.join(buffer)             return {'response':
         | full_response}, state.append(response=full_response)
         | 
         | Then you would call `application.stream_result()` function,
         | which would give you back a container object that you can
         | stream to the user:
         | streaming_result_container = application.stream_result(...)
         | action_we_just_ran = streaming_result_container.get()
         | print(f"getting streaming results for
         | action={action_we_just_ran.name}")              for
         | result_component in streaming_result_container:
         | print(result_component['response']) # this assumes you have a
         | response key in your result              # get the final result
         | final_state, final_result = streaming_result_container.get()
         | 
         | Its nice in a web-server or a streamlit app where you can use
         | streaming responses to connect to the frontend. Here's how we
         | use it in a streamlit app -- we plan for a streaming web server
         | soon: https://github.com/DAGWorks-
         | Inc/burr/blob/main/examples/stre....
        
       | juliantcchu wrote:
       | Have been using it the past few weeks and loving it so far.
       | Highly recommend
        
         | krawczstef wrote:
         | thanks Julian - your feedback has been super helpful!
        
       | talos_ wrote:
       | There's a LOT of LLM frameworks out there. The "bring your own
       | framework" adds flexibility instead of endless integrations (like
       | in ML libraries and MLOps tooling).
       | 
       | If I'm getting started with LLMs or have 1-2 POC deployed at my
       | company what's the benefit of adding burr to my stack?
        
         | elijahbenizzy wrote:
         | So I'm a big fan of limiting complexity -- you generally don't
         | want to add something to your stack unless you get value from
         | it.
         | 
         | If you're getting started with LLMs, the first step is an API
         | call to OpenAI (or some other foundational model) to get a feel
         | for it, then think about how you want to integrate with your
         | application. Burr can help standardize the structure, allowing
         | you to think less about it at any given point (as logic is
         | encapsulated into actions). Furthermore, the UI can help you
         | debug.
         | 
         | With a few POCs at your company (assuming you want to iterate
         | and get into production), Burr can help abstract some complex
         | parts away (state management and telemetry), and make your code
         | cleaner and more extensible, which can help you iterate quickly
         | and explain what you're doing (you can always draw a picture of
         | your app).
         | 
         | We think that it buys companies something they really want --
         | the ability to swap out implementations/vendors as it decouples
         | the logic from application concerns.
         | 
         | So, if you have logic that's starting to get complicated (and
         | might get more-so), I think Burr is good. If all you're doing
         | is a single wrapper over a GPT-call, it might be overkill (or
         | not! things tend to grow in complexity).
        
           | talos_ wrote:
           | That makes sense. I'll look into it more!
           | 
           | One project was to create a layer between external APIs
           | (OpenAI, Gemini, Claude) and our team. This way we can manage
           | the cost of API calls, try different providers, and log API
           | usage to find out what works / what doesn't.
           | 
           | How does burr scale with an application and support multiple
           | users?
        
             | elijahbenizzy wrote:
             | Cool! Yeah I think that would pair nicely. You can have the
             | abstraction at the action level, or push it down into the
             | stack (as it seems you're doing). Have also seen some
             | startups building proxy layers for OpenAI/GPT.
             | 
             | It's just a library so it depends on what could be the
             | bottlenecks. Standard ones are: 1. Compute -- just scale
             | horizontally 2. DB -- scale horizontally (and vertically),
             | use intelligent partitioning
             | 
             | (1) is generally easy with autoscaling, (2) is where your
             | queries/data modeling matter, and this is why we made
             | persistence entirely pluggable, figuring we wouldn't know
             | the best usage patterns yet. More reading:
             | https://burr.dagworks.io/concepts/state-persistence/.
             | 
             | One more thing to think about -- Burr isn't opinionated
             | about how you build your app -- E.G. up to you to think
             | about what happens if you have your webapp open in, say,
             | two tabs. You'll need to manage situations like this, but
             | Burr's plugin capabilities should make that easier to
             | iterate on. We're looking to add more examples on this --
             | also reach out if you have any specific questions/find
             | anything missing/have suggestions.
        
       | pama wrote:
       | Thanks for these efforts. I am surprised that you "... were
       | surprised how many built bespoke state management layers and used
       | printlines to debug." Print is the common debug tool since the
       | start of CS, and bespoke state management layers, though not
       | ideal in a hypothetical world, involve less transition effort
       | than a commitment to a new framework not invented here that
       | eventually has to change. Maybe you meant something different?
        
         | elijahbenizzy wrote:
         | Sure, yes, not going to lie I use printlines all the time.
         | There's nothing inherently wrong with it, and meant no offense!
         | In fact, you can even add a hook with Burr that adds as many
         | printlines as you like :)
         | https://burr.dagworks.io/concepts/hooks/. Meant no offense.
         | 
         | That said, logging stuff and using regexes (which is the common
         | way to productionize printlines for live/post-hoc debugging)
         | has a lot of challenges, and this is where we've seen a
         | framework become useful.
         | 
         | There's an obvious trade-off between adopting a framework and
         | rolling your own -- we've seen both work in various contexts
         | (we've been in MLOps for quite a while). Our hope is that this
         | will make people more productive by constraining the space they
         | need to think about and allow more mental room for application
         | logic.
        
         | avereveard wrote:
         | Right? People forget the roots. Nowasays logs are super
         | complicated things, but in unix and early Linux days,
         | everything was a file descriptor and it happened to be that 0
         | was the default one to write to and by convention could be
         | routed by pipes and streams into whatever making it super
         | convenient way to emit informations not to users, but to
         | anything
         | 
         | Which is the right thing anyway. If your target are interactive
         | users a log is a terrible thing to do as they won't know where
         | it ends up. And if the target is not interactive users you
         | should have error codes not log messages.
         | 
         | I think that is partly because today is hard to just attach a
         | debugger to things and see the state of a program so people try
         | to serialise the program state to english strings with a
         | timestamp and work backwards from that to understand what the
         | program was doing, but then what we need is a differentiable
         | single stack core dump not an english approximation of it.
         | 
         | And especially not one that gets stolen by systemd and hidden
         | away.
         | 
         | Sorry for the rant but just printing infos and core dumping
         | errors was the best thing ever and it was taken away from me.
        
       ___________________________________________________________________
       (page generated 2024-04-03 23:01 UTC)