[HN Gopher] Building Boba AI: Lessons learnt in building an LLM-...
       ___________________________________________________________________
        
       Building Boba AI: Lessons learnt in building an LLM-powered
       application
        
       Author : nalgeon
       Score  : 75 points
       Date   : 2023-06-29 17:19 UTC (5 hours ago)
        
 (HTM) web link (martinfowler.com)
 (TXT) w3m dump (martinfowler.com)
        
       | selalipop wrote:
       | I worked on something very much in this vein (notionsmith.ai) and
       | feel like I should do a write up after reading this!
       | 
       | I think a lot of people are learning these lessons in isolation,
       | I do wish there was a centralized place where people working on
       | UX-focused LLM based apps were exchanging lessons
        
         | tinco wrote:
         | I think a lot of us are working heads down in isolation because
         | we don't have a shareworthy project yet. In a week or two I
         | think my system will be fancy enough to write a blog post about
         | and maybe make open source.
         | 
         | HN has been a pretty good source of exchanging knowledge so
         | far, every couple days or so there's a write up like this that
         | has some new tidbits or confirmations of ideas. If everyone
         | keeps doing that we're doing great in my opinion. Looking
         | forward to seeing your write up on here!
        
         | ignoramous wrote:
         | Things on the LLM front for utility apps are fairly nascent and
         | by OpenAI's own admission, the current limitations are
         | fleeting, as in, as a developer, you will soon not need the
         | workarounds used today.
         | 
         | Multi-modal models are going to change things even further.
        
       | daviding wrote:
       | This is an interesting article, and a bit of a mish mash of UI
       | conventions, application use ideas for GPT and actual patterns
       | for LLMs. I really do miss Martin Fowler's actual take on these
       | things, but using his name as some sort of gestalt brain for
       | Thoughtworks works too.
       | 
       | It still feels like a bit of a Wild West for patterns in this
       | area as yet, with a lot of people trying lots of things and it
       | might be too soon for defining terms. A useful resource is still
       | things like the OpenAI Cookbook, that is a decent collection of a
       | lot of the things in this article but with a more implementation
       | bent.[1]
       | 
       | The area that seems to get a lot of idea duplication currently is
       | in providing either a 'session' or a longer term context for GPT,
       | be it with embeddings or rolling prompts for these apps. The use
       | of vector search and embedded chunks is something that seems to
       | be missing so far from vendors like OpenAI, and you can't help
       | but wonder that they'll move it behind their API eventually with
       | a 'session id' in the end. I think that was mentioned as on their
       | roadmap for this year too. The lack of GPT-4 fine tuning options
       | just seems to push people more to look at the Pinecone, Weaviates
       | etc stores and chaining up their own sequences to achieve some
       | sort of memory.
       | 
       | I've implemented features with GPT-4 and functions and so far
       | it's feeling useful for 'data model' like use (where you're
       | bringing json into the prompt about a domain noun, e.g. 'Tasks')
       | but is pretty hairy when it comes to pure functions - the tuning
       | they've done to get it to pick which function and which
       | parameters to use is still hard going to get right, which means
       | there doesn't feel like a lot of trust that it is going to be
       | usable. It's like there needs to be a set of patterns or
       | categories for 'business apps' that are heavily siloed into just
       | a subset of available functions it can work with, making it more
       | task-specific rather than as a general chat agent we see a lot
       | of. The difference in approach between LangChain's Chain of
       | Thought pattern and just using OpenAI functions is sort of up in
       | the air as well. Like I said, it still all feels like we're in
       | wild west times, at least as an app developer.
       | 
       | [1] https://github.com/openai/openai-cookbook
        
         | ignoramous wrote:
         | > _A useful resource is still things like the OpenAI Cookbook,
         | that is a decent collection of a lot of the things in this
         | article_
         | 
         | By far, the best resource I've found is the _Prompt Engineering
         | Guide_ : https://www.promptingguide.ai/
         | 
         | > _you can 't help but wonder that they'll move it behind their
         | API eventually with a 'session id' in the end_
         | 
         | For in-context learning, I think it is fair to expect _100k_ to
         | _500k_ context windows sooner. OpenAI is already at _32k_.
        
           | daviding wrote:
           | > By far, the best resource I've found is the Prompt
           | Engineering Guide: https://www.promptingguide.ai/
           | 
           | Agreed, that is a good resource for sure. For tooling I like
           | https://promptmetheus.com/ but any pun name gets bonus points
           | from me.
           | 
           | > For in-context learning, I think it is fair to expect 100k
           | to 500k context windows sooner. OpenAI is already at 32k.
           | 
           | It has been interesting to see that window increase so
           | quickly. For LLM context the biggest thing is the pay-per-
           | token constraint if you don't run your own, so have to wonder
           | if that is what will be around in the future given how this
           | is trending? Just in terms of idempotent calls, throwing
           | everything in context up every time seems like it makes it
           | likely that OpenAI will encroach on the stores side as well
           | and do sessions?
        
       | akiselev wrote:
       | _> Along the way, we've learned some useful lessons on how to
       | build these kinds of applications, which we've formulated in
       | terms of patterns._                   * Use a text template to
       | enrich a prompt with context and structure         * Tell the LLM
       | to respond in a structured data format         * Stream the
       | response to the UI so users can monitor progress         *
       | Capture and add relevant context information to subsequent action
       | * Allow direct conversation with the LLM within a context.
       | * Tell LLM to generate intermediate results while answering
       | * Provide affordances for the user to have a back-and-forth
       | interaction with the co-pilot         * Combine LLM with other
       | information sources to access data beyond the LLM's training set
        
         | manojlds wrote:
         | The short courses from dl.ai are better at driving these points
         | - https://www.deeplearning.ai/short-courses/
        
         | frankgrecojr wrote:
         | > Stream the response to the UI so users can monitor progress
         | 
         | This is a game changer to the UX
        
           | jamifsud wrote:
           | Anyone know of any good "tolerant" JSON parsers? I'd love to
           | be able to stream a JSON response down to the client and have
           | it be able to parse the JSON as it goes and handle the
           | formatting errors that we sometimes see.
        
           | senko wrote:
           | It's a crutch to minimize the user annoyance at having to
           | wait up to a minute for the response. It sure beats the
           | spinner but it's still a crutch.
        
           | behnamoh wrote:
           | Actually, it's annoying because as you start reading the
           | first lines, the content keeps scrolling (often with jagged
           | movements). I always have to scroll up immediately after the
           | stream begins to disable this behavior.
        
             | tobr wrote:
             | That's totally fixable, though. ReadyRunner handles it
             | simply by scrolling all the way from the start, leaving
             | space for the message to grow.
        
               | trafnar wrote:
               | Hey, that's my app! https://www.readyrunner.ai
        
           | huydotnet wrote:
           | Still not a reasonable way if you're expecting a structured
           | data in the response, like JSON or something that you're
           | required to parse before showing to the user.
        
       | sgt101 wrote:
       | I find the whole idea of adding text into text to drive a outcome
       | pretty worrying if I have to rely on the output.
       | 
       | If the probability of the model spitting out something bad is
       | 0.01% will my testing find it? Probably not.. but my users
       | certainly will.
        
         | phillipcarter wrote:
         | Well, it's a tool for ideation, not a strategy emitter. You
         | don't rely on the output, you rely on the people who finalize
         | and commit to a strategy.
        
           | sgt101 wrote:
           | Yeah - for an application like this I get it. But no one is
           | getting rich or shifting the dial on scientific progress with
           | this sort of thing.
        
       | m3kw9 wrote:
       | LLM latency is a huge no go for most apps except for chat apps.
       | I've try to build apps based on OpenAI and that itself creates a
       | bad experience no matter how much elevator music/mirrors/spinners
       | you place. Then you need proper error correction when dealing
       | with structured responses/occasional hallucinations
        
       | mvdtnz wrote:
       | I am so despondent at the lack of creativity in most of the
       | (many, many) LLM powered projects that are popping up. I have
       | seen hardly a single thing that goes beyond "it's a chat bot, but
       | with a special prompt". Like, is this the best we can expect from
       | this supposedly ground-breaking technology?
        
         | bugglebeetle wrote:
         | Most of the stuff it's actually good at (like NLP tasks) are
         | both super boring and require a secondary layer of processing
         | to catch hallucinations. Not as cool of a sales pitch to
         | everyone on the "it's alive!" hype train.
        
         | pertymcpert wrote:
         | Same. You just know most of the paid apps are going to be
         | abandoned in a few months.
        
       ___________________________________________________________________
       (page generated 2023-06-29 23:00 UTC)