[HN Gopher] TypeChat
___________________________________________________________________
TypeChat
Author : DanRosenwasser
Score : 228 points
Date : 2023-07-20 16:41 UTC (6 hours ago)
(HTM) web link (microsoft.github.io)
(TXT) w3m dump (microsoft.github.io)
| paxys wrote:
| I swear I think of something and Anders Hejlsberg builds it.
|
| Structured requests and responses are 100% the next evolution of
| LLMs. People are already getting tired of chatbots. Being able to
| plug in any backend without worrying about text parsing and
| prompts will be amazing.
| sidnb13 wrote:
| Maybe worth looking into:
| https://news.ycombinator.com/item?id=36750083
| sidnb13 wrote:
| maybe worth looking into:
| https://news.ycombinator.com/item?id=36750083
| _the_inflator wrote:
| This as a dynamic mapper in a backend layer can be huge.
|
| For example, try to keep up with (frequent) API payload changes
| around a consumer in Java. We implemented a NodeJS layer just
| to stay sane. (Banking, huge JSON payloads, backends in Java)
|
| Mapping is really something LLMs could shine.
| tylerrobinson wrote:
| It could shine, or it could be an absolute disaster.
|
| Code/functionality archeology is already insanely hard in
| orgs with old codebases. Imagine the facepalming that Future
| You will have when you see that the way the system works is
| some sort of nondeterministic translation layer that
| magically connects two APIs where versions are allowed to
| fluctuate.
| unshavedyak wrote:
| > Structured requests and responses are 100% the next evolution
| of LLMs. People are already getting tired of chatbots. Being
| able to plug in any backend without worrying about text parsing
| and prompts will be amazing.
|
| Yup, a general desire of mine is to locally run an LLM which
| has actionable interfaces that i provide. Things like "check
| time", "check calendar", "send message to user" and etc.
|
| TypeChat seems to be in the right area. I can imagine an extra
| layer of "fit this JSON input to a possible action, if any" and
| etc.
|
| I see a neat hybrid future where a bot (LLM/etc) works to glue
| layers of real code together. Sometimes part of ingestion,
| tagging, etc - sometimes part of responding to input, etc.
|
| All around this is a super interesting area to me but frankly,
| everything is moving so fast i haven't concerned myself with
| diving too deep in it yet. Lots of smart people are working on
| it so i feel the need to let the dust settle a bit. But i think
| we're already there to have my "dream home interface" working.
| sdwr wrote:
| I was thinking about this yesterday. ChatGPT really is good
| enough to act as a proper virtual assistant / home manager,
| with enough toggles exposed.
| 9dev wrote:
| ChatGPT isn't the limiting factor here, a good way to
| expose the toggles is. I recently tried to expose our
| company CRM to employees by means of a Teams bot they could
| ask for stuff in natural language (like ,,send an invite
| link to newlead@example.org" or ,,how many MAUs did
| customer Foo have in June"), but while I almost got there,
| communicating an ever-growing set of actionable commands
| (with an arbitrary number of arguments) to the model was
| more complex than I thought.
| unshavedyak wrote:
| Care to share what made it complex? My comment above was
| most likely ignorant, but my general thought was to write
| some header prompt about available actions that the LLM
| could map to, and then ask it if a given input text
| matches to a pre-defined action. Much like what TypeChat
| does.
|
| Does this sound similar enough to what you were doing?
| Was there something difficult in this that you could
| explain?
|
| Aside from being completely hand-wavey in my hypothetical
| guess-timated implementation, i had figured the most
| difficult part would be piping complex actions together.
| "Remind me tomorrow about any events i have on my
| calendar" would be a conditional action based on lookups,
| etc - so order of operations would also have to be parsed
| somehow. I suspect a looping "thinking" mechanism would
| be necessary, and while i know that's not a novel idea i
| am unsure if i would nonetheless have to reinvent it in
| my own tech for the way i wanted to deploy.
| J_Shelby_J wrote:
| https://github.com/ShelbyJenkins/LLM-OpenAPI-minifier
|
| I have a working solution to exposing the toggles.
|
| I'm integrating it into the bot I have in the other repo.
|
| Goal is you point to an openapi spec and then GPT can run
| choose and run functions. Basically Siri but with access
| to any API.
| ianzakalwe wrote:
| I am not sure why this exist, maybe I am missing something, and
| it does not seem like there is much value past "hey check this
| out this is possible"
| phillipcarter wrote:
| I'd love to see a robust study on the effectiveness of this and
| several other ways to coax a structured response out:
|
| - Lots of examples / prompt engineering techniques
|
| - MS Guideance
|
| - TypeChat
|
| - OpenAI functions (the model itself is tuned to do this, a key
| differentiator)
|
| - ...others?
| 33a wrote:
| Looks like it just runs the LLM in a loop until it spits out
| something that type checks, prompting with the error message.
|
| This is a cute idea and it looks like it should work, but I could
| see this getting expensive with larger models and input prompts.
| Probably not a fix for all scenarios.
| osaariki wrote:
| I'm not familiar with how TypeChat works, but Guidance [1] is
| another similar project that can actually integrate into the
| token sampling to enforce formats.
|
| [1]: https://github.com/microsoft/guidance
| behnamoh wrote:
| except that guidance is defunct and is not maintained
| anymore.
| J_Shelby_J wrote:
| It's logit bias. You don't even need another library to do
| this. You can do it with three lines of python.
|
| Here's an example of one of my implementations of logit bias.
|
| https://github.com/ShelbyJenkins/shelby-as-a-
| service/blob/74...
| babyshake wrote:
| At least with OpenAI, wouldn't it be better if under the hood
| it was using the new function call feature?
| akavi wrote:
| Typescript's type system is much more expressive than the one
| the function call feature makes available.
|
| I imagine closing the loop (using the TS compiler to restrict
| token output weights) is in the works, though it's probably
| not totally trivial. You'd need:
|
| * An incremental TS compiler that could report "valid" or
| "valid prefix" (ie, valid as long as the next token is not
| EOF)
|
| * The ability to backtrack the model
|
| Idk how hard either one piece is.
| rezonant wrote:
| For the TS compiler: If you took each generation step,
| closed any partial JSON objects (ie close any open `{`),
| checked that it was valid JSON and then validated it using
| a deep version of Partial<T>, that should do the trick.
| SkyPuncher wrote:
| I suspect most products are concerned about product-market fit
| then they can wrangle costs down.
|
| There's also a good assumption that models will be improving
| structured output as the market is demanding it.
| dvt wrote:
| This is my hot take: we're slowly entering the "tooling" phase of
| AI, where people realize there's no real value generation here,
| but people are so heavily invested in AI, that money is still
| being pumped into building stuff (and of course, it's one of the
| best way to guarantee your academic paper gets published). I
| mean, LangChain is kind of a joke and they raised $10M seed lol.
|
| DeFi/crypto went through this phase 2 years ago. Mark my words,
| it's going to end up being this weird limbo for a few years where
| people will slowly realize that AI is a feature, not a product.
| And that its applicability is limited and that it won't save the
| world. It won't be able to self-drive cars due to all the edge
| cases, it won't be able to perform surgeries because it might
| kill people, etc.
|
| I keep mentioning that even the most useful AI tools (Copilot,
| etc.) are marginally useful at best. At the very best it saves me
| a few clicks on Google, but the agents are not "intelligent" in
| the least. We went through a similar bubble a few years ago with
| chatbots[1]. These days, no one cares about them. "The metaverse"
| was much more short-lived, but the same herd mentality applies.
| "It's the next big thing" until it isn't.
|
| [1] https://venturebeat.com/business/facebook-opens-its-
| messenge...
| [deleted]
| rvz wrote:
| Someone should just get this working on Llama 2 instead of
| OpenAI.com [0]
|
| All this is it's just talking to a AI model sitting on someone
| else's server.
|
| [0]
| https://github.com/microsoft/TypeChat/blob/main/src/model.ts...
| joelmgallant wrote:
| The most recent gpt4all (https://github.com/nomic-ai/gpt4all)
| includes a local server compatible with OpenAPI -- this could
| be a useful start!
| DanRosenwasser wrote:
| Hi there! I'm one of the people working on TypeChat and I just
| want to say that we definitely welcome experimentation on
| things like this. We've actually been experimenting with
| running Llama 2 ourselves. Like you said, to get a model
| working with TypeChat all you really need is to provide a
| completion function. So give it a shot!
| jensneuse wrote:
| This looks quite similar to how were using OpenAI functions and
| zod (JSON Schema) to have OpenAI answer with JSON and interact
| with our custom functions to answer a prompt:
| https://wundergraph.com/blog/return_json_from_openai
| joefreeman wrote:
| > It's unfortunately easy to get a response that includes {
| "name": "grande latte" } type Item = {
| name: string; ... size?: string;
|
| I'm not really following how this would avoid `name: "grande
| latte"`?
|
| But then the example response: "size": 16
|
| > This is pretty great!
|
| Is it? It's not even returning the type being asked for?
|
| I'm guessing this is more of a typo in the example, because
| otherwise this seems cool.
| mynameisvlad wrote:
| I feel like that's just a documentation bug. I'm guessing they
| changed from number of ounces to canonical size late in the
| drafting of the announcement and forgot to change the output
| value to match.
|
| There would be no way for a system to map "grande" to 16 based
| on the code provided, and 16 does not seem to be used anywhere
| else.
| DanRosenwasser wrote:
| Whoops - thanks for catching this. Earlier iterations of this
| blog post used an different schema where `size` had been
| accidentally specified as a `number`. While we changed the
| schema, we hadn't re-run the prompt. It should be fixed now!
| graypegg wrote:
| Their example here is really weak overall IMO. Like more than
| just that typo. You also probably wouldn't want a "name" string
| field anyway. Like there's nothing stoping you from receiving
| { name: "the brown one", size: "the
| espresso cup", ... }
|
| Like that's just as bad as parsing the original string. You
| probably want big string union types for each one of those
| representing whatever known values you want, so the LLM can try
| and match them.
|
| But now why would you want that to be locked into the type
| syntax? You probably want something more like Zod where you can
| use some runtime data to build up those union types.
|
| You also want restrictions on the types too, like quantity
| should be a positive, non-fractional integer. Of course you can
| just validate the JSON values afterwards, but now the user gets
| two kinds of errors. One from the LLM which is fluent and human
| sounding, and the other which is a weird technical "oops! You
| provided a value that is too large for quantity" error.
|
| The type syntax seems like the wrong place to describe this
| stuff.
| hirsin wrote:
| The rest of the paragraph discusses "what happens when it
| ignores type?", so I think that's where they were going with
| that?
| verdverm wrote:
| I don't see the value add here.
|
| Here's the core of the message sent to the LLM:
| https://github.com/microsoft/TypeChat/blob/main/src/typechat...
|
| You are basically getting a fixed prompt to return structured
| data with a small amount of automation and vendor lockin. All
| these LLM libraries are just crappy APIs to the underlying API.
| It is trivial to write a script that does the same and will be
| much more flexible as models and user needs evolve.
|
| As an example, think about how you could change the prompt or use
| python classes instead. How much work would this be using a
| library like this versus something that lifts the API calls and
| text templating to the user like: https://github.com/hofstadter-
| io/hof/blob/_dev/flow/chat/llm...
| ofslidingfeet wrote:
| Getting these models to reliably return a consistent structure
| without frequent human intervention and/or having to account
| for the personal moral opinions of big tech CEOs is not
| trivial, no.
| whimsicalism wrote:
| Yes as the abstractions gets better it becomes easier to code
| useful things.
| politelemon wrote:
| Pretty much all the LLM libraries I'm seeing are like this.
| They boil down to a request to the LLM to do something in a
| certain way. I've noticed under complex conditions, they stop
| listening and start reverting to their 'default' behavior.
|
| But that said it still feels like using a library is the right
| thing to do... so I'm still watching this space to see what
| matures and emerges as a good-enough approach.
| bwestergard wrote:
| The value is in:
|
| 1. Running the typescript type checker against what is returned
| by the LLM.
|
| 2. If there are type errors, combining those into a "repair
| prompt" that will (it is assumed) have a higher likelihood of
| eliciting an LLM output that type checks.
|
| 3. Gracefully handling the cases where the heuristic in #2
| fails.
|
| https://github.com/microsoft/TypeChat/blob/main/src/typechat...
|
| In my experience experimenting with the same basic idea, the
| heuristic in #2 works surprisingly well for relatively simple
| types (i.e. records and arrays not nested too deeply, limited
| use of type variables). It turns out that prompting LLMs to
| return values inhabiting relatively simple types can be used to
| create useful applications. Since that is valuable, this
| library is valuable inasmuch as it eliminates the need to hand
| roll this request pattern, and provides a standardized
| integration with the typescript codebase.
| verdverm wrote:
| these are trivial steps you can add in any script, as your
| link demonstrates.
|
| Why would I want to add all this extra stuff just for that?
| The opaque retry until it returns valid JSON? That sounds
| like it will make for many pleasant support cases or issues
|
| Personally, I have found investing more effort in the actual
| prompt engineering improves success rates and reduces the
| need to retry with an appended error message. Especially
| helpful are input/output pairs (i.e. few-shot) and while we
| haven't tried it yet, I imagine fine-tuning and distillation
| would improve the situation even more
| bwestergard wrote:
| There are many subtleties to invoking the typescript type
| checker from node. It's nice to have support for that from
| the team that maintains the type checker.
| BoorishBears wrote:
| Here's a project that does that better imo:
|
| https://github.com/dzhng/zod-gpt
|
| And by better I mean doesn't tie you to OpenAI for no good
| reason
| LordDragonfang wrote:
| I don't know where all you people work that your employer
| would prefer a random git repo (that has no support and no
| guarantee of updates) over a solution from _Microsoft_.
| (Alternatively: that you have so much free time that you 'd
| prefer to fiddle with your own validation code instead of
| writing your actual app)
|
| Open source solutions are great, but having a first-party
| solution is _also a good thing_.
| BoorishBears wrote:
| I don't know which employer is hiring the people who make
| logical leaps like this but I thank them for their
| sacrifice.
|
| At the end of the day the repo I linked is grokkable with
| about 10 minutes of effort, and has simple demonstrable
| usefulness by letting you swap out the LLM you're
| calling.
|
| Both are experimental open source libraries in an
| experimental space.
| TechBro8615 wrote:
| Where's the vendor lock-in? This is an open source library and
| the file you linked to even includes configs for two vendors:
| ChatGPT and Bard.
| nfw2 wrote:
| It's essentially prompt engineering as a service with some
| basic quality-control features thrown in.
|
| Sure, your engineers could implement it themselves, but don't
| they have better things to do?
| arc9693 wrote:
| TL;DR: It's asking ChatGPT to format response according to a
| schema.
| bottlepalm wrote:
| How does no voice assistant (Apple, Google, Amazon, Microsoft)
| integrate LLMs into their service yet, and how has OpenAI not
| released their own voice assistant?
|
| Also like RSS, if there were some standard URL a websites exposed
| for AI interaction, using this TypeChat to expose the interfaces,
| we'd be well on our way here.
| 9dev wrote:
| Seriously, it feels like there's some collusion going on behind
| the scenes. This is the most obvious use case for the
| technology, but none of the big vendors have explored it.
| jomohke wrote:
| It takes a while to develop a product, and the world only
| woke up to them mere months ago
| zitterbewegung wrote:
| Microsoft is doing that to replace Cortana in windows 11
| COGlory wrote:
| Willow, and the Willow Interference Server have the option to
| use Vicuna with speech input and TTS
| dbish wrote:
| OpenAI is pretty likely working on their own (see Kaparthy's
| "Building a kind of JARVIS @ OreoA[?]"), and Microsoft of
| course is doing an integration or reinterpretation of Cortana
| with OpenAI's LLMS (since they are incapable of building their
| own models nowadays it seems - "Why do we have Microsoft
| Research at all?"-S.N.), but there's a lot less value in voice
| driven LLM then there is in actually being able to perform
| actions. Take Alexa for example, you need a system that can
| handle smart home control in a predictable, debuggable, way
| otherwise people would get annoyed. I definitely think you can
| do this, but the current system as built (and others like Siri
| and to a lesser use Cortana) all have a bunch of hooks and APIs
| being used by years and years of rules and software built atop
| less powerful models. They need to both maintain the current
| quality and improve on it while swapping out major parts of
| their system in order to make this work, which takes time.
|
| Not to mention that none of these assistants actually make any
| money, they all lose money really, and are only worthwhile to
| big companies with other ways to make cash or drive other parts
| of their business (phones, shopping, whatever), so there's less
| incentive for a startup to do it.
|
| I worked on both Cortana and Alexa in the past, thought a lot
| about trying to build a new version of them ground up with the
| LLM advancements, and while the tech was all straight forward
| and even had some new ideas for use cases that are enabled now,
| could not figure out a business model that would work (and
| hence, working on something completely different now).
| sandkoan wrote:
| Relevant: Built this which generalizes to arbitrary regex
| patterns / context free grammars with 100% adherence and is
| model-agnostic -- https://news.ycombinator.com/item?id=36750083
| davrous wrote:
| This is a fantastic concept! It's going to be super useful to map
| users' intent to API / code in a super reliable way.
| Zaheer wrote:
| It's not super clear how this differs from another recently
| released library from Microsoft: Guidance
| (https://github.com/microsoft/guidance).
|
| They both seem to aim to solve the problem of getting typed,
| valid responses back from LLMs
| DanRosenwasser wrote:
| One of the key things that we've focused on with TypeChat is
| not just that it acts as a specification for retrieving
| structured data (i.e. JSON), but that the structure is actually
| valid - that it's well-typed based on your type definitions.
|
| The thing to keep in mind with these different libraries is
| that they are not necessarily perfect substitutes for each
| other. They often serve different use-cases, or can be combined
| in various ways -- possibly using the techniques directly and
| independent of the libraries themselves.
| trafnar wrote:
| It's not clear to me how they ensure the responses will be valid
| JSON, are they just asking for it, then parsing the result with
| error checking?
| davnicwil wrote:
| seems like they run the generated response through the
| typescript type checker, and if it fails, retry using the error
| message as a further hint to the LLM, until it succeeds.
| anonzzzies wrote:
| I would expect that, if it doesn't do that even, why
| bother... that is also trivial to do anyway.
| [deleted]
| verdverm wrote:
| also some very basic prompt engineering
| esafak wrote:
| Yes.
| https://github.com/microsoft/TypeChat/blob/main/src/typechat...
| mahalex wrote:
| So, it's a thing that appends "please format your response as the
| following JSON" to the prompt", then validates the actual
| response against the schema, all in a "while (true)" loop
| (literally) until it succeeds. This unbelievable achievement is a
| work of seven people (authors of the blog post).
|
| Honestly, this is getting beyond embarrassing. How is this the
| world we live in?
| jlnho wrote:
| It's because not everyone can be as gifted as you.
|
| I think the (arguably very prototypical) implementation is not
| what's interesting here. It's the concept itself. Natural
| language may soon become the default interface for most of the
| computing people do on a day to day basis, and tools like these
| will make it easier to create new applications in this space.
| Edes wrote:
| I'm gonna love trying to figure out what query gets the
| support chatbot to pair me with an actual human so that I can
| solve something that's off script
| lsh123 wrote:
| Hm... so how do we know that the actual values in the produced
| json are correct???
| siva7 wrote:
| One of the authors is Anders Hejlsberg, the guy behind c# and
| delphi
| mahalex wrote:
| That's what makes it even more embarrassing.
| katamaster818 wrote:
| Hang on, so this is doing runtime validation of an object against
| a typescript type definition? Can this be shipped as a standalone
| library/feature? This would be absolutely game changing for
| validating api response payloads, etc. in typescript codebases.
| tehsauce wrote:
| maybe this function?
|
| https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494...
| katamaster818 wrote:
| yup, just found that, super neat, I am 100% interested in
| using this for other runtime validation...
|
| It's interesting because I've always been under the
| impression the TS team was against the use of types at
| runtime (that's why projects like
| https://github.com/nonara/ts-patch exist), but now they're
| doing it themselves with this project...
|
| I wonder what the performance overhead of starting up an
| instance of tsc in memory is? Is this suitable for low
| latency situations? Lots of testing to do...
| robbie-c wrote:
| This is funny, I have something pretty similar in my code, except
| it's using Zod for runtime typechecking, and I convert Zod
| schemas to json schemas and send that to gpt-3.5 as a function
| call. I would expect that using TypeScript's output is better for
| recovering from errors than with Zod's output, so I can
| definitely see the advantage of this.
| bestcoder69 wrote:
| Why this instead of GPT Functions?
| verdverm wrote:
| it's basically the same thing, but uses a more concise spec for
| writing the schema (typescript vs jsonschema)
|
| In the end, both methods try to coax the model into returning a
| JSON object, one method can be used with any model, the other
| is tied to a specific, ever changing vendor API
|
| Why would one choose to only support "OpenAI" and nothing else?
| yanis_t wrote:
| TL;DR: This is ChatGPT + TypeScript.
|
| I'm totally happy to be able to receive structured queries, but
| I'm also not 100% sure TypeScript is the right tool, it seems to
| be an overkill. I mean obviously you don't need the power of TS
| with all its enums, generics, etc.
|
| Plus given that it will run multiple queries in loop, it might
| end up very expensive for it abide by your custom-mage complex
| type
| garrett_makes wrote:
| I built and released something really similar to this (but
| smaller scope) for Laravel PHP this week:
| https://github.com/adrenallen/ai-agents-laravel
|
| My take on this is, it should be easy for an engineer to spin up
| a new "bot" with a given LLM. There's a lot of boring work around
| translating your functions into something ChatGPT understands,
| then dealing with the response and parsing it back again.
|
| With systems like these you can just focus on writing the actual
| PHP code, adding a few clear comments, and then the bot can
| immediately use your code like a tool in whatever task you give
| it.
|
| Another benefit to things like this, is that it makes it much
| easier for code to be shared. If someone writes a function, you
| could pull it into a new bot and immediately use it. It
| eliminates the layer of "converting this for the LLM to use and
| understand", which I think is pretty cool and makes building so
| much quicker!
|
| None of this is perfect yet, but I think this is the direction
| everything will go so that we can start to leverage each others
| code better. Think about how we use package managers in coding
| today, I want a package manager for AI specific tooling. Just
| install the "get the weather" library, add it to my bot, and now
| it can get the weather.
| ameyab wrote:
| Here's a relevant paper that folks may find interesting:
| <snip>Semantic Interpreter leverages an Analysis-Retrieval prompt
| construction method with LLMs for program synthesis, translating
| natural language user utterances to ODSL programs that can be
| transpiled to application APIs and then executed.</snip>
|
| https://arxiv.org/abs/2306.03460
___________________________________________________________________
(page generated 2023-07-20 23:00 UTC)