[HN Gopher] Native JSON Output from GPT-4
___________________________________________________________________
Native JSON Output from GPT-4
Author : yonom
Score : 578 points
Date : 2023-06-14 19:07 UTC (1 days ago)
(HTM) web link (yonom.substack.com)
(TXT) w3m dump (yonom.substack.com)
| aecorredor wrote:
| Newbie in machine learning here. It's crazy that this is the top
| post just today. I've been doing the intro to deep learning
| course from MIT this week, mainly because I have a ton of JSON
| files that are already classified, and want to train a model that
| can generate new JSON data by taking classification tags as
| input.
|
| So naturally this post is exciting. My main unknown right now is
| figuring out which model to train my data on. An RNN, a GAN, a
| diffusion model?
| ilaksh wrote:
| Did you read the article? To do it with OpenAI you would just
| put a few output examples in the prompt and then give it a
| function that takes the class and the output parameters
| correspond to the JSON format you want, or just a string
| containing JSON.
|
| You could also fine tuned an LLM like Falcon-7b but probably
| not necessary and nothing to do with OpenAI.
|
| You might also look into the OpenAI Embedding API as a third
| option.
|
| I would try the first option though.
| chaxor wrote:
| Is there a decent way of converting to a structure with a very
| constrained vocabulary? For example, given some input text,
| converting it to something like {"OID-189": "QQID-378",
| "OID-478":"QQID-678"}. Where OID and QQID dictionaries can be
| e.g. millions of different items defined by a description. The
| rules for mapping could be essentially what looks closest in
| semantic space to the descriptions given in a dictionary.
|
| I know this should be able to be solvable by local LLMs and bert
| cosine similarity (it isn't exactly, but it's a start on the
| idea), but is there a way to do this with decoder models rather
| than encoder models with other logic?
| jiggawatts wrote:
| You can train custom GPT 3 models, and Azure now has vector
| database integration for GPT-based models in the cloud. You can
| feed it the data, and ask it for the embedding lookup, etc...
|
| You can also host a vector database yourself and fill it up
| with the embeddings from the OpenAI GPT 3 API.
| chaxor wrote:
| Unfortunately this doesn't really work, as the model is not
| limited in it's decoding vocabulary.
|
| Does anyone have other suggestions that may work in this
| space?
| srameshc wrote:
| I've used GCP Vertex AI for a specific task and the prompt was to
| generate a JSON response with keys specified and it does generate
| the result as JSON with said keys.
| twelfthnight wrote:
| Issue is that's it's not guaranteed, unlike this new openai
| feature. Personally, Ive found Vertex AI's json output to be
| not so great, it often uses single quotes in my experience. But
| maybe you have figured out the right prompts? I'd be interested
| what you use if so.
| khazhoux wrote:
| I'm trying to experiment with the API but the response time is
| always in the 15-25second range. How are people getting any
| interesting work done with it?
|
| I see others on the OpenAPI dev forum complaining about this too,
| but no resolution.
| 037 wrote:
| I'm wondering if introducing a system message like "convert the
| resulting json to yaml and return the yaml only" would adversely
| affect the optimization done for these models. The reason is that
| yaml uses significantly fewer tokens compared to json. For the
| output, where data type specification or adding comments may not
| be necessary, this could be beneficial. From my understanding,
| specifying functions in json now uses fewer tokens, but I believe
| the response still consumes the usual amount of tokens.
| lbeurerkellner wrote:
| I think one should not underestimate the impact on downstream
| performance the output format can have. From a modelling
| perspective it is unclear whether asking/fine-tuning the model
| to generate JSON (or YAML) output is really lossless with
| respect to the raw reasoning powers of the model (e.g. it may
| perform worse on tasks when asked/trained to always respond in
| JSON).
|
| I am sure they ran tests on this internally, but I wonder what
| the concrete effects are, especially comparing different output
| formats like JSON, YAML, different function calling conventions
| and/or forms of tool discovery.
| gregw134 wrote:
| That's what I'm doing. I ask ChatGPT to return inline yaml (no
| wasting tokens on line breaks), then I parse the yaml output
| into JSON once I receive it. A bit awkward but it cuts costs in
| half.
| bel423 wrote:
| Did people really struggle with getting JSON outputs from GPT4.
| You can literally do it zero shot by just saying match this
| typescript type.
|
| GPT3.5 would output perfect JSON with a single example.
|
| I have no idea why people are talking about this like it's a new
| development.
| brolumir wrote:
| Unfortunately, in practice that works only _most of the time_.
| At least in our experience (and the article says something
| similar) sometimes ChatGPT would return something completely
| different when JSON-formatted response would be expected.
| tornato7 wrote:
| In my experience if you set the temperature to zero it works
| 99.9% of the time, and then you can just add retry logic for
| the remaining 0.1%
| blamy wrote:
| I've been using the same prompts for months and have never
| seen this happen on 3.5-turbo let alone 4.
|
| https://gist.github.com/BLamy/244eec016beb9ad8ed48cf61fd2054.
| ..
| imranq wrote:
| Wouldnt this be possible with a solution like Guidance where you
| have a pre structured JSON format ready to go and all you need is
| text: https://github.com/microsoft/guidance
| swyx wrote:
| i think people are underestimating the potential here for agents
| building - it is now a lot easier for GPT4 to call other models,
| or itself. while i was taking notes for our emergency pod
| yesterday (https://www.latent.space/p/function-agents) we had
| this interesting debate with Simon Willison on just how many
| functions will be supplied to this API. Simon thinks it will be
| "deep" rather than "wide" - eg a few functions that do many
| things, rather than many functions that do few things. I think i
| agree.
|
| you can now trivially make GPT4 decide whether to call itself
| again, or to proceed to the next stage. it feels like the first
| XOR circuit from which we can compose a "transistor", from which
| we can compose a new kind of CPU.
| quickthrower2 wrote:
| The first transistors were slow, and it seems this "GPT3/4
| calling itself" stuff is quite slow. GPT3/4 as a direct chat is
| about as slow as I can take. Once this gets sped up.
|
| I am sure it will, as you can scale out, scale up and build
| more efficient code and build more efficient architectures and
| "tool for the job" different parts of the process.
|
| The problem now (using auto gpt, for example) is accuracy is
| bad, so you need human feedback and intervention AND it is
| slow. Take away the slow, or the needing human intervention and
| this can be very powerful.
|
| I dream of the breakthrough "shitty old laptop is all you need"
| paper where they figure out how to do amazing stuff with a 1Gb
| of space on a spinny disk and 1Gb RAM and a CPU.
| jillesvangurp wrote:
| Exactly, we humans can use specialized models and traditional
| tool APIs and models and orchestrate the use of all these
| without understanding how these things work in detail.
|
| To do accounting, GPT 4 (or future models) doesn't have to know
| how to calculate. All it needs to know how to interface with
| tools like calculators, spreadsheets, etc. and parse their
| outputs. Every script, program, etc. becomes a thing that has
| such an API. A lot what we humans do to solve problems is
| breaking down big problems into problems where we know the
| solution already.
|
| Real life tool interfaces are messy and optimized for humans
| with their limited language and cognitive skills. Ironically,
| that means they are relatively easy to figure out for AI
| language models. Relative to human language the grammar of
| these tool "languages" is more regular and the syntax less
| ambiguous and complicated. Which is why gpt 3 and 4 are
| reasonably proficient with even some more obscure programming
| languages and in the use of various frameworks; including some
| very obscure ones.
|
| Given a lot of these tools with machine accessible APIs with
| some sort of description or documentation, figuring out how to
| call these things is relatively straightforward for a language
| model. The rest is just coming up with a high level plan and
| then executing it. Which amounts to generating some sort of
| script that does this. As soon as you have that, that in itself
| becomes a tool that may be used later. So, it can get better
| over time. Especially once it starts incorporating feedback
| about the quality of its results. It would be able to run mini
| experiments and run its own QA on its own output as well.
| jarulraj wrote:
| Interesting observation, @swyx. There seems to be a connection
| to transitive closure in SQL queries, where the output of the
| query is fed as the input to the query in the next iteration
| [1]. We are thinking about how to best support such recursive
| functions in EvaDB [2].
|
| [1] http://dwhoman.com/blog/sql-transitive-closure.html [2]
| https://evadb.readthedocs.io/en/stable/source/tutorials/11-s...
| boringuser2 wrote:
| Do you people always have to overhype this shit?
| throwuwu wrote:
| What's your problem? There's nothing overhyped about that
| comment. People, including me, _are_ building complex agents
| that can execute multi stage prompts and perform complex
| tasks. Comparing these first models to a basic unit of logic
| is more than fair given how much more capable they are. Do
| you just have an axe to grind?
| boringuser2 wrote:
| [flagged]
| pyinstallwoes wrote:
| How is it inappropriate? How is it not building?
| [deleted]
| delhanty wrote:
| Do _you_ have to be nasty?
|
| That's a person you're replying to with feelings, so why not
| default to being kind in comments as per HN guidelines?
|
| As it happens, swyx has built notable AI related things, for
| example smol-developer
|
| https://twitter.com/swyx/status/1657892220492738560
|
| and it would be nice to be able to read his and other
| perspectives without having to read shallow, mean, dismissive
| replies such as yours.
| boringuser2 wrote:
| [flagged]
| dang wrote:
| Hey, I understand the frustration (both the frustration
| of endless links on an over-hyped topic, and the
| frustration of getting scolded by another user when
| expressing yourself) - but it really would be good if
| you'd post more in the intended spirit of this site
| (https://news.ycombinator.com/newsguidelines.html).
|
| People sometimes misunderstand this, so I'd like to
| explain a bit. (It probably won't help, but it might, and
| I don't like flagging or banning accounts without trying
| to persuade people first if possible.)
|
| We don't ask people to be kind, post thoughtfully, not
| call names, not flame, etc., out of nannyism or some
| moral thing we're trying to impose. That wouldn't feel
| right and I wouldn't want to be under Mary Poppins's
| umbrella either.
|
| The reason is more like an engineering problem: we're
| trying to optimize for one specific thing (https://hn.alg
| olia.com/?dateRange=all&page=0&prefix=true&sor...) and we
| can't do that if people don't respect certain
| constraints. The constraints are to prevent the forum
| from burning itself to a crisp, which is where the arrow
| of internet entropy will take us if we don't expend
| energy to stave it off.
|
| It probably doesn't feel like you're doing anything
| particularly wrong, but there's a cognitive bias where
| everyone underestimates the damage they're causing (by
| say 10x) and overestimates the damage others are causing
| (by say 10x) and that compounds into a major bias where
| everyone feels like everyone else is the problem. We need
| a way out of that dynamic if we're to have any hope of
| keeping this place interesting. As you probably realize,
| HN is forever on the threshold of caving into a pit. We
| need you to help nudge it back from that, not push it
| over.
|
| Of course you're free to say "what do I care if HN burns
| itself to a crisp, fuck you all" but I'd argue you
| shouldn't take that nihilistic position because it isn't
| in your own interests. HN may be annoying at times, but
| it's interesting enough for you to spend time here--
| otherwise you wouldn't be reading the site and posting to
| it. Why not contribute to making it _more_ interesting
| rather than destroying it for yourself and everyone else?
| (I don 't mean that you're intentionally destroying it--
| but the way you've been posting is unintentionally
| contributing to that outcome.)
|
| I'm sure you wouldn't drop lit matches in a dry forest,
| or dump motor oil in a mountain lake, trample flower
| gardens, or litter in a city park, for much the same
| reason. It's in your own interest to practice the same
| care for the commons here. Thanks for listening.
| boringuser2 wrote:
| Thanks for the effort of explanation.
|
| If someone were egregiously out of line, typically, I
| feel community sentiment reflects this.
|
| Personally, I feel your assessment of cognitive bias at
| play is way off base. I don't think it's a valid
| comparison to claim that someone is causing "damage" by
| merely expressing distaste. That's a common tool that
| humans use for social feedback. Is cutting off the
| ability for genuine social feedback or adjustment and
| forcing people to be saccharine out of fear of reprisal
| from the top really an _optimal_ solution to an
| engineering problem? It seems more like a simulacrum of
| an HR department where the guillotine is more real: your
| job and life rather than merely your ability to share
| your thoughts on a corner of the Internet.
|
| Think about the engineering problem you find yourself in
| with this state of affairs: something very similar to the
| kind of content you might find on LinkedIn, a sort of
| circular back-patting engine devoid of real challenge and
| grit because of the aforementioned guillotine under which
| all participants hang.
|
| And, quite frankly, you _do_ see the effects of this in
| precisely the post in this initial exchange: hyperbole
| and lack of deep critical assessment are artificially
| inflated. This isn 't a coincidence: this has been
| cultured very specifically by the available growing
| conditions and the starter used -- saccharine hall
| monitors that fold like cheap suits (e.g. very poorly,
| lots of creases) when the lowest level of social
| challenge is raised fo their ideas.
|
| You know what it really feels like? A Silicon Valley
| reconstruction of all the bad things about a workplace,
| not a genuine forum for debate and intellectual
| exploration. If you want to find a place to model such
| behavior, the Greeks already have you figured out - how
| do you think Diogenes would feel about human resources?
|
| That being said, I appreciate the empathy.
|
| Obviously, I feel a bit like a guy Tony Soprano beat up
| and being forced to apologize afterwards to him for
| bruising his knuckles.
| dang wrote:
| Not to belabor the point but from my perspective you've
| illustrated the point about cognitive bias: it always
| feels like the other person started it and did worse ("I
| feel a bit like a guy Tony Soprano beat up and being
| forced to apologize afterwards to him for bruising his
| knuckles") and it always feels like one was merely
| defending oneself reasonably ("merely expressing
| distaste"). This is the asymmetry I'm talking about.
|
| As you can imagine, mods get this kind of feedback all
| the time from all angles. The basic learning it adds up
| to is that everybody always feels this way. Therefore
| those feelings are not a reliable compass to navigate by.
|
| This is not a criticism--I appreciate your reply!
|
| Edit:
|
| > _forcing people to be saccharine [...] like a
| simulacrum of an HR department_
|
| We definitely don't want that and the site guidelines
| don't call for that. There is tons of room to make your
| substantive points thoughtfully without being saccharine.
| It can take a little bit of reflective work to find that
| room, though, just because we (humans in general) tend to
| get locked into binary oppositions.
|
| The best principle to go by is just to ask yourself: is
| what I'm posting part of a _curious_ conversation? That
| 's the intended spirit of the site. It's possible to tell
| if you (I don't mean you personally, I mean all of us)
| are functioning in the range of curiosity and to refrain
| from posting if you aren't.
|
| It _is_ true that the HN guidelines bring a bit of
| blandness to discourse because they eliminate the rough-
| and-tumble debate that can work well in much smaller
| groups of close peers. But that 's because that kind of
| debate is impossible in a large public forum like HN--it
| just degenerates immediately into dumb brawls. I've
| written about this quite a bit if you or anyone wants to
| read about that:
|
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
| que... (I like the rugby analogy)
|
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
| que...
| boringuser2 wrote:
| I think your argument is reasonable from a logical
| perspective, and I would generally make a similar
| argument as I would find the template quite persuasive.
|
| However, I, again, feel you're improperly pushing shapes
| into the shape-board again. Of course, understanding
| cognitive bias is a fantastic tool to improve human
| behavior from an engineering perspective, and your
| argumentum ad numerum is sound.
|
| That being said, you're focusing too much on what my
| emotional motivation might be rather than looking at the
| system - do you really think there _isn 't_ an element of
| that dynamic I outlined in an interaction like this? Of
| course there is.
|
| Anyhow, you know, I don't have the terminology in my
| back-pocket, but there's definitely a large blind-spot
| when someone is ignoring the spirit of intellectual
| curiosity in a positive light rather than a negative one.
|
| In this case, don't you think a tool like mild negative
| social feedback might be a useful mechanism? Of course,
| there's a limit, and if such a person were incapable of
| further insight, they'd probably not be very useful
| conversants. That's obviously not happening here.
|
| One final thing is relevant here - you just hit on a
| pretty important point. There is a grit to a certain type
| of discourse that is actually superior to this discourse,
| I'd happily accept that point. Why not just transfer the
| burden of moderation to that point, rather than what you
| perceive to be the outset? Surely, you'll greatly reduce
| your number of false positives.
|
| I provide negative social feedback sometimes because I
| feel it's appropriate. In the future, I probably won't.
| That being said, it's obvious that I've never sparked a
| thoughtless brawl, so the tolerance is at least
| inappropriately adjusted sufficiently to that extent.
| killingtime74 wrote:
| Who is Simon Willison? Is he big in AI?
| swyx wrote:
| formerly cocreator of Django, now Datasette, but pretty much
| the top writer/hacker on HN making AI topics accessible to
| engineers https://hn.algolia.com/?dateRange=pastYear&page=0&p
| refix=tru...
| killingtime74 wrote:
| Oh wow, nice! Big fan of his work
| ilaksh wrote:
| The thing is the relevant context often depends on what it's
| trying to do. You can give it a lot of context in 16k but if
| there are too many different types of things then I think it
| will be confused or at least have less capacity for the actual
| selected task.
|
| So what I am thinking is that some functions might just be like
| gateways into a second menu level. So instead of just edit_file
| with the filename and new source, maybe only
| select_files_for_edit is available at the top level. In that
| case I can ensure it doesn't try to overwrite an existing file
| without important stuff that was already in there, by providing
| the requested files existing contents along with the function
| allowing the file edit.
| naiv wrote:
| I think big context only makes sense for document analysis.
|
| For programming you want to keep it slim. Just like you
| should keep your controllers and classes slim.
|
| Also people with 32k access report very very long response
| times of up to multiple minutes which is not feasible if you
| only want a smaller change or analysis.
| throwuwu wrote:
| Not sure that's true. I haven't completely filled the context
| with examples but I do provide 8 or so exchanges between user
| and assistant along with a menu of available commands and it
| seems to be able to generalize from that very well. No
| hallucinations either. Good idea about sub menus though, I'll
| have to use that.
| jonplackett wrote:
| It was already quite easy to get GPT-4 to output json. You just
| append 'reply in json with this format' and it does a really
| good job.
|
| GPT-3.5 was very haphazard though and needs extensive
| babysitting and reminding, so if this makes gpt3 better then
| it's useful - it does have an annoying disclaimer though that
| 'it may not reply with valid json' so we'll still have to do
| some sense checks into he output.
|
| I have been using this to make a few 'choose your own
| adventure' type games and I can see there's a TONNE of
| potential useful things.
| throwuwu wrote:
| Just end your request with
|
| '''json
|
| Or provide a few examples of user request and then agent
| response in json. Or both.
| clbrmbr wrote:
| Does the ```json trick work with the chat models? Or only
| the earlier completion models?
| throwuwu wrote:
| Works with chat. They're still text completion models
| under all that rlhf
| reallymental wrote:
| Is there any publicly available resource replicate your work?
| I would love to just find the right kind of "incantation" for
| the gpt-3.5-t or gpt-4 to output a meaningful story arc etc.
|
| Any examples of your work would be greatly helpful as well!
| devbent wrote:
| I have an open source project doing exactly this at
| https://www.generativestorytelling.ai/ GitHub link is on
| the main page!
| SamPatt wrote:
| I'm not the person you're asking, but I built a site that
| allows you to generate fiction if you have an OpenAI API
| key. You can see the prompts sent in console, and it's all
| open source:
|
| https://havewords.ai/
| seizethecheese wrote:
| In a production system, you don't need easy to do most of the
| time, you need easy without fail.
| pnpnp wrote:
| Ok, just playing devil's advocate here. How many FAANG
| companies have you seen have an outage this year? What's
| their budget?
|
| I think a better way to reply to the author would have been
| "how often does it fail"?
|
| Every system will have outages, it's just a matter of how
| much money you can throw at the problem to reduce them.
| jrockway wrote:
| If 99.995% correct looks bad to users, wait until they
| see 37%.
| ignite wrote:
| > You just append 'reply in json with this format' and it
| does a really good job.
|
| It does an ok job. Except when it doesn't. Definitely misses
| a lot of the time, sometimes on prompts that succeeded on
| previous runs.
| bel423 wrote:
| It literally does it everytime perfectly. I remember I put
| together an entire system that would validate the JSON
| against a zod schema and use reflection to fix it and it
| literally never gets triggered because GPT3.5-turbo always
| does it right the first time.
| thomasfromcdnjs wrote:
| Are you saying that it return only JSON before? I'm with
| the other commenters it was wildly variable and always at
| least said "Here is your response" which doesn't parse
| well.
| travisjungroth wrote:
| If you want a parsable response, have it wrap that with
| ```. Include an example request/response in your history.
| Treat any message you can't parse as an error message.
|
| This works well because it has a place to put any "keep
| in mind" noise. You can actually include that in your
| example.
| worik wrote:
| > It literally does it everytime perfectly. I remember I
| put together an entire system that would validate the
| JSON against a zod schema and use reflection to fix it
| and it literally never gets triggered because
| GPT3.5-turbo always does it right the first time.
|
| Danger! There be assumptions!!
|
| gpt-? is a moving target and in rapid development. What
| it does Tuesday, which it did not do on Monday, it may
| well not do on Wednesday
|
| If there is a documented method to guarantee it, it will
| work that way (modulo OpenAI bugs - and now Microsoft is
| involved....)
|
| What we had before, what you are talking of, was observed
| behaviour. An assumption that what we observed in the
| past will continue in the future is not something to
| build a business on
| travisjungroth wrote:
| ChatGPT moves fast. The API version doesn't seem to
| change except with the model and documented API changes.
| lmeyerov wrote:
| Yeah no
| whateveracct wrote:
| No it doesn't lol. I've seen it just randomly not use a
| comma after one array element, for example.
| LanceJones wrote:
| Yep. Incorrect trailing commas ad nauseum for me.
| [deleted]
| sheepscreek wrote:
| The solution that worked great for me - do not use JSON for
| GPT to agent communication. Use comma separated key=value, or
| something to that effect.
|
| Then have another pure code layer to parse that into
| structured JSON.
|
| I think it's the JSON syntax (with curly braces) that does it
| in. So YAML or TOML might work just as well, but I haven't
| tried that.
| bombela wrote:
| It's harder to form a tree with key value. I also tried the
| relational route. But it would always messup the
| cardinality (one person should have 0 or n friends, but a
| person has a single birth date).
| sheepscreek wrote:
| You could flatten it using namespaced keys. Eg.
| { parent1: { child1: value } }
|
| Becomes one of the following:
| parent1/child1=value parent1_child1=value
| parent1.child1=value
|
| ..you get the idea.
| rubyskills wrote:
| It's also harder to stream JSON? Maybe I'm overthinking
| this.
| jacobsimon wrote:
| Coincidentally, I just published this JS library[1] over
| the weekend that helps prompt LLMs to return typed JSON
| data and validates it for you. Would love feedback on it if
| this is something people here are interested in. Haven't
| played around with the new API yet but I think this is
| super exciting stuff!
|
| [1] https://github.com/jacobsimon/prompting
| golergka wrote:
| Looks promising! Do you do retries when returned json is
| invalid? Personally, I used io-ts for parsing, and GPT
| seems to be able to correct itself easily when confronted
| with a well-formed error message.
| jacobsimon wrote:
| Great idea, I was going to add basic retries but didn't
| think to include the error.
|
| Any other features you'd expect in a prompt builder like
| this? I'm tempted to add lots of other utility methods
| like classify(), summarize(), language(), etc
| bradly wrote:
| I could not get GPT-4 to reliably not give some sort of text
| response, even if was just a simple "Sure" followed by the
| JSON.
| avereveard wrote:
| Pass in an agent message with "Sure here is the answer in
| json format:" after the user message. Gpt will think it has
| already done the preamble and the rest of the message will
| start right with the json.
| rytill wrote:
| Did you try using the API and providing a very clear system
| message followed by several examples that were pure JSON?
| bradly wrote:
| Yep. I even gave it a JSON schema file to use. It just
| wouldn't stop added extra verbage.
| taylorfinley wrote:
| I just use a regex to select everything between the first
| and last curly bracket, reliable fixes the "sure, here's
| your object" problem.
| NicoJuicy wrote:
| Say it's a json API and may only reply with valid json
| without explanation.
| bradly wrote:
| Lol yes of course I tried that.
| dror wrote:
| I've had good luck with both:
|
| https://github.com/drorm/gish/blob/main/tasks/coding.txt
|
| and
|
| https://github.com/drorm/gish/blob/main/tasks/webapp.txt
|
| With the second one, I reliably generated half a dozen
| apps with one command.
|
| Not to say that it won't fail sometimes.
| NicoJuicy wrote:
| Combine both ? :)
| cwxm wrote:
| even with gpt 4, it hallucinates enough that it's not
| reliable, forgetting to open/close brackets and quotes. This
| sounds like it'd be a big improvement.
| ztratar wrote:
| Nah, this was solved by most teams a while ago.
| bel423 wrote:
| I feel like I'm taking crazy pills with the amount of
| people saying this is game changing.
|
| Did they not even try asking gpt to format the output as
| json?
| worik wrote:
| > I feel like I'm taking crazy pills....try asking gpt to
| format the output as json
|
| You are taking crazey pills. Stop
|
| gpt-? is unreliable! That is not a bug in it, it is the
| nature of the beast.
|
| It is not an expert at anything except natural language,
| and even then it is an idiot savant
| jonplackett wrote:
| Not that it matters now but just doing something like this
| works 99% of the time or more with 4 and 90% with 3.5.
|
| It is VERY IMPORTANT that you respond in valid JSON ONLY.
| Nothing before or after. Make sure to escape all strings.
| Use this format:
|
| {"some_variable": [describe the variable purpose]}
| 8organicbits wrote:
| Wouldn't you use traditional software to validate the
| JSON, then ask chatgpt to try again if it wasn't right?
| girvo wrote:
| In my experience, telling it "no thats wrong, try again"
| just gets it to be wrong in a new different way, or
| restate the same wrong answer slightly differently. I've
| had to explicitly guide it to correct answers or formats
| at times.
| lpfnrobin wrote:
| [flagged]
| cjbprime wrote:
| Try different phrasing, like "Did your answer follow all
| of the criteria?".
| SamPatt wrote:
| 99% of the time is still super frustrating when it fails,
| if you're using it in a consumer facing app. You have to
| clean up the output to avoid getting an error. If it goes
| from 99% to 100% JSON that is a big deal for me, much
| simpler.
| jonplackett wrote:
| Except it says in the small print to expect invalid JSON
| occasionally, so you have to write your error handling
| code either way
| davepeck wrote:
| Yup. Is there a good/forgiving "drunken JSON parser"
| library that people like to use? Feels like it would be a
| useful (and separable) piece?
| golol wrote:
| Honestly, I suspect asking GPT-4 to fix your JSON (in a
| new chat) is a good drunken JSON parser. We are only
| scraping the surface of what's possible with LLMs. If
| Token generation was free and instant we could come up
| with a giant schema of interacting model calls that
| generates 10 suggestions, iterates over them, ranks them
| and picks the best one, as silly as it sounds.
| andai wrote:
| That's hilarious... if parsing GPT's JSON fails, keep
| asking GPT to fix it until it parses!
| golol wrote:
| It shouldn't be surprising though. If a human makes an
| error parsing JSON, what do you do? You make them look
| over it again. Unless their intelligence is the
| bottleneck they might just be able to fix it.
| golergka wrote:
| It works. Just be sure to build a good error message.
| hhh wrote:
| I already do this today to create domain-specific
| knowledge focused prompts and then have them iterate back
| and forth and a 'moderator' that chooses what goes in and
| what doesn't.
| golergka wrote:
| If you're building an app based on LLMs that expects
| higher than 99% correctness from it, you are bound to
| fail. Negative scenarios workarounds and retries are
| mandatory.
| whateveracct wrote:
| It forgets commas too
| muzani wrote:
| It's fine, but the article makes some good points why - less
| cognitive load for GPT and less tokens. I think the
| transistor to logic gate analogy makes sense. You can build
| the thing perfectly with transistors, but just use the logic
| gate lol.
| sethd wrote:
| I like to define a JSON schema (https://json-schema.org/) and
| prompt GPT-4 to output JSON based on that schema.
|
| This lets me specify general requirements (not just JSON
| structure) inline with the schema and in a very detailed and
| structured manor.
| SimianLogic wrote:
| I agree with this. We've already gotten pretty good at json
| coercion, but this seems like it goes one step further by
| bundling decision making in to the model instead of junking up
| your prompt or requiring some kind of eval on a single json
| response.
|
| It should also be much easier to cache these functions. If you
| send the same set of functions on every API hit, OpenAI should
| be able to cache that more intelligently than if everything was
| one big text prompt.
| minimaxir wrote:
| "Trivial" is misleading. From OpenAI's docs and demos, the full
| ReAct workflow is an order of magnitude more difficult than
| typical ChatGPT API usage with a new set of constaints (e.g.
| schema definitions)
|
| Even OpenAI's notebook demo has error handling workflows which
| was actually necessary since ChatGPT returned incorrect
| formatted output.
| cjonas wrote:
| Maybe trivial isn't the right word, but it's still very
| straight-forward to get something basic, yet really
| powerful...
|
| ReAct Setup Prompt (goal + available actions) -> Agent
| "ReAction" -> Parse & Execute Action -> Send Action Response
| (success or error) -> Agent "ReAction" -> repeat
|
| As long as each action has proper validation and returns
| meaningful error messages, you don't need to even change the
| control flow. The agent will typically understand what went
| wrong, and attempt to correct it in the next "ReAction".
|
| I've been refactoring some agents to use "functions" and so
| far it seems to be a HUGE improvement in reliability vs the
| "Return JSON matching this format" approach. Most impactful
| is that fact that "3.5-turbo" will now reliability return
| JSON (before you'd be forced to use GPT-4 for an ReAct style
| agent of modest complexity).
|
| My agents also seem to be better at following other
| instructions now that the noise of the response format is
| gone (of course it's still there, but in a way it has been
| specifically trained on). This could also just be a result of
| the improvements to the system prompt though.
| [deleted]
| devbent wrote:
| For 3.5, I found it easiest to specify a simple, but
| parsable, format for responses and then convert that to
| JSON myself.
|
| I'll have to see if the new JSON schema support is easier
| than what I already have in place.
| lbeurerkellner wrote:
| It's interesting to think about this form of computation (LLM +
| function call) in terms of circuitry. It is still unclear to me
| however, if the sequential form of reasoning imposed by a
| sequence of chat messages is the right model here. LLM decoding
| and also more high-level "reasoning algorithms" like tree of
| thought are not that linear.
|
| Ever since we started working on LMQL, the overarching vision
| all along was to get to a form of language model programming,
| where LLM calls are just the smallest primitive of the "text
| computer" you are running on. It will be interesting to see
| what kind of patterns emerge, now that the smallest primitive
| becomes more robust and reliable, at least in terms of the
| interface.
| freezed88 wrote:
| 100%, if the API itself can choose to call a function or an
| LLM, then it's way easier to build any agent loop without
| extensive prompt engineering + worrying about errors.
|
| Tweeted about it here as well:
| https://twitter.com/jerryjliu0/status/1668994580396621827?s=...
| bel423 wrote:
| You still have to worry about errors. You will probably have
| to add an error handler function that it can call out to.
| Otherwise the LLM will hallucinate a valid output regardless
| of the input. You want it to be able to throw an error and
| say I could produce the output given this format.
| moneywoes wrote:
| Wow your brand is huge. Crazy growth. i wonder how much these
| subtle mentions on forums help
| swyx wrote:
| i mean hopefully its relevant content to the discussion, i
| hope enough pple know me here by now that i fully participate
| in The Discourse rather than just being here to cynically
| plug my stuff. i had a 1.5 hr convo with simon willison and
| other well known AI tinkerers on this exact thing, and so I
| shared it, making the most out of their time that they chose
| to share with me.
| TeMPOraL wrote:
| They're the only one commenter on HN I noticed keeps writing
| "smol" instead of "small", and is associated with projects
| with "smol" in their name. Surely I'm not the only one who
| missed it being a meme around 2015 or sth., and finds this
| word/use jarring - and therefore very attention-grabbing?
| Wonder how much that helps with marketing.
|
| This is meant with no negative intentions. It's just that
| 'swyx was, in my mind, "that HN-er that does AI and keeps
| saying 'smol'" for far longer than I was aware of
| latent.space articles/podcasts.
| swyx wrote:
| and fun fact i used to work at Temporal too heheh.
| memefrog wrote:
| Personally, I associate "smol" with "doggo" and "chonker"
| and other childish redditspeak.
| ftxbro wrote:
| > "you can now trivially make GPT4 decide whether to call
| itself again, or to proceed to the next stage."
|
| Does this mean the GPT-4 API is now publicly available, or is
| there still a waitlist? If there's a waitlist and you literally
| are not allowed to use it no matter how much you are willing to
| pay then it seems like it's hard to call that trivial.
| bayesianbot wrote:
| "With these updates, we'll be inviting many more people from
| the waitlist to try GPT-4 over the coming weeks, with the
| intent to remove the waitlist entirely with this model. Thank
| you to everyone who has been patiently waiting, we are
| excited to see what you build with GPT-4!"
|
| https://openai.com/blog/function-calling-and-other-api-
| updat...
| Tostino wrote:
| Not GP, but it's still the latter...i've been (im)patiently
| waiting.
|
| From their blog post the other day: With these updates, we'll
| be inviting many more people from the waitlist to try GPT-4
| over the coming weeks, with the intent to remove the waitlist
| entirely with this model. Thank you to everyone who has been
| patiently waiting, we are excited to see what you build with
| GPT-4!
| londons_explore wrote:
| If you put contact info in your HN profile - especially an
| email address that matches one you use to login to openai,
| someone will probably give you access...
|
| Anyone with access can share it with any other user via the
| 'invite to organisation' feature. Obviously that allows the
| invited person do requests billed to the inviter, but since
| most experiments are only a few cents that doesn't really
| matter much in practice.
| Tostino wrote:
| Good to know, but I've racked up a decent bill for just
| my GPT 3.5 use. I can get by with experiments using my
| ChatGPT Plus subscription, but I really need my own API
| access to start using it for anything serious.
| majormajor wrote:
| GPT-4 was already a massive improvement on 3.5 in terms of
| replying consistently in a certain JSON structure - I often
| don't even need to give examples, just a sentence describing
| the format.
|
| It's great to see they're making it even better, but where I'm
| currently hitting the limit still in GPT-4 for "shelling out"
| is about it being truly "creative" or "introspective" about "do
| I need to ask for clarifications" or "can I find a truly novel
| away around this task" type of things vs "here's a possible but
| half-baked sequence I'm going to follow".
| fumar wrote:
| It is "good enough". Where I struggle is maintaining its
| memory through a longer request where multiple iterations
| fail or succeed and then all of a sudden its memory is
| exceeded and starts fresh. I wish I could store "learnings"
| that it could revisit.
| ehsanu1 wrote:
| Sounds like you want something like tree of thoughts:
| https://arxiv.org/abs/2305.10601
| jimmySixDOF wrote:
| Interestingly the paper's repo starts off :
|
| Blah Blah "...is NOT the correct implementation to
| replicate paper results. In fact, people have reported
| that his code cannot properly run, and is probably
| automatically generated by ChatGPT, and kyegomez has done
| so for other popular ML methods, while intentionally
| refusing to link to official implementations for his own
| interests"
|
| Love a good GitHub Identity Theft Star farming ML story
|
| But this method could have potential for a chain of
| function
| babyshake wrote:
| What would be an example where there needs to be an arbitrary
| level of recursive ability for GPT4 to call itself?
| swyx wrote:
| writing code of higher complexity (we know from CICERO that
| longer time spent on inference is worth orders of magnitude
| more than the equivalent in training when it comes to
| improving end performance), or doing real world tasks with
| unknown fractal depth (aka yak shave)
| iamflimflam1 wrote:
| It's pretty interesting how the work they've been doing on
| plugins has fed into this.
|
| I suspect that they've managed to get a lot of good training data
| by calling the APIs provided by plugins and detecting when it's
| gone wrong from bad request responses.
| coding123 wrote:
| We're not far from writing a bunch of stubs, query GPT at startup
| to resolve the business logic. I guess we're going to need a new
| JAX-RS soon.
| runeb wrote:
| The way openai implemented this is really clever, beyond how neat
| the plugin architecture is, as it lets them peek one layer inside
| your internal API surface and can infer what you intend to do
| with the LLM output. Collecting some good data here.
| swyx wrote:
| huh, i never thought of it that way. i thought openai pinky
| swears not to train on our data tho
| danShumway wrote:
| I'm concerned that OpenAI's example documentation suggests using
| this to A) construct SQL queries and B) summarize emails, but
| that their example code doesn't include clear hooks for human
| validation before actions are called.
|
| For a recipe builder it's not so big a deal, but I really worry
| how eager people are to remove human review from these steps. It
| gets rid of a very important mechanism for reducing the risks of
| prompt injection.
|
| The top comment here suggests wiring this up to allow GPT-4 to
| recursively call itself. Meanwhile, some of the best advice I've
| seen from security professionals on secure LLM app development is
| to whenever possible completely isolate queries from each other
| to reduce the potential damage that a compromised agent can do
| before its "memory" is wiped.
|
| There are definitely ways to use this safely, and there are
| definitely some pretty powerful apps you could build on top of
| this without much risk. LLMs as a transformation layer for
| trusted input is a good use-case. But are devs going to stick
| with that? Is it going to be used safely? Do devs understand any
| of the risks or how to mitigate them in the first place?
|
| 3rd-party plugins on ChatGPT have repeatedly been vulnerable in
| the real world, I'm worried about what mistakes developers are
| going to make now that they're actively encouraged to treat GPT
| as even more of a low-level data layer. Especially since OpenAI's
| documentation on how to build secure apps is mostly pretty bad,
| and they don't seem to be spending much time or effort educating
| developers/partners on how to approach LLM security.
| irthomasthomas wrote:
| I don't understand why they have done this? Like, how did the
| conversations go when it was pointed out to them what a pretty
| darn bad idea it was to recommend connecting chatgpt directly
| to a SQL database?
|
| I know we are supposed to assume incompetence over malice, but
| no one is that incompetent. They must have had the
| conversations, and chose to do it anyway.
| sebzim4500 wrote:
| Why is this unreasonable to you? I can imagine using this,
| just run it with read access and check the sql if the results
| are interesting.
| irthomasthomas wrote:
| Even read only. You are giving access to your data to a
| black box API.
| sebzim4500 wrote:
| If it's on Azure anyway I don't see the big deal,
| especially if you are an enterprise and so buying it via
| azure instead of directly.
| blitzar wrote:
| Perhaps they plan on having ChatGPT make a quick copy of your
| database, for your convenience of course.
| abhibeckert wrote:
| In my opinion the only way to use it safely is to ensure your
| AI only has access to data that the end user already has access
| to.
|
| At that point, prompt injection is no-longer an issue - because
| the AI doesn't need to hide anything.
|
| Giving GPT access to your entire database, but telling it not
| to reveal certain bits, is never going to work. There will
| always be side channel vulnerabilities in those systems.
| kristiandupont wrote:
| >At that point, prompt injection is no-longer an issue [...]
|
| As far as input goes, yes. But I am more worried about agents
| that can take actions that affect the outside world, like
| sending emails on your behalf.
| jacobr1 wrote:
| > your AI only has access to data that the end user already
| has access to.
|
| That doesn't work for the same reason you mention with a DB
| ... any data source is vulnerable to indirect injection
| attacks. If you open the door to ANY data source this a
| factor, including ones under the sole "control" of the user.
| danShumway wrote:
| > e.g. define a function called extract_data(name: string,
| birthday: string), or sql_query(query: string)
|
| This section in OpenAI's product announcement really
| irritates me because it's so obvious that the model should
| have access to a subset of API calls that themselves fetch
| the data, as opposed to giving the model raw access to SQL.
| You could have the same capabilities while eliminating a huge
| amount of risk. And OpenAI just sticks this right in the
| announcement, they're encouraging it.
|
| When I'm building a completely isolated backend with just
| regular code, I still usually put a data access layer in
| front of the database in most cases. I still don't want my
| REST endpoints directly building SQL queries or directly
| accessing the database, and that's without an LLM in the loop
| at all. It's just safer.
|
| It's the same idea as using `innerHTML`; in general it's
| better when possible to have those kinds of calls extremely
| isolated and to go through functions that constrain what can
| go wrong. But no, OpenAI just straight up telling developers
| to do the wrong things and to give GPT unrestricted database
| access.
| jmull wrote:
| SQL doesn't necessarily have to mean full database access.
|
| I known it's pretty common to have apps connect to a
| database with a db user with full access to do anything,
| but that's definitely not the only way.
|
| If you're interested in being safer, it's worth learning
| the security features built in to your database.
| danShumway wrote:
| > If you're interested in being safer, it's worth
| learning the security features built in to your database.
|
| The problem isn't that there's no way to be safe, the
| problem is that OpenAI's documentation does not do
| anything to discourage developers from implementing this
| in the most dangerous way possible. Like you suggest, the
| most common way this will be implemented is via a db user
| with full access to do anything.
|
| Developers would be far more likely to implement this
| safely if they were discouraged from using direct SQL
| queries. Developers who know how to safely add SQL
| queries will still know how to do that -- but developers
| who are copying and pasting code or thinking naively
| "can't I just feed my schema into GPT" should be pushed
| towards an implementation that's harder to mess up.
| jmull wrote:
| It's hard for me to believe openai's documentation will
| have any effect on developers who write or copy-and-paste
| data access code without regard to security, no matter
| what it says.
|
| If you provide an API or other external access to app
| data and the app data contains anything not everyone
| should be able to access freely then your API has to
| implement some kind of access control. It really doesn't
| matter if your API is SQL-based, REST-based, or whatever.
|
| A SQL-based API isn't inherently less secure than a non-
| SQL-based one if you implement access control, and a non-
| SQL-based API isn't inherently more secure than a SQL-
| based one if you don't implement access control. The SQL-
| ness of an API doesn't change the security picture.
| danShumway wrote:
| > If you provide an API or other external access to app
| data and the app data contains anything not everyone
| should be able to access freely then your API has to
| implement some kind of access control. It really doesn't
| matter if your API is SQL-based, REST-based, or whatever.
|
| I don't think that's the way developers are going to
| interact with GPT at all, I don't think they're looking
| at this as if it's external access. OpenAI's
| documentation makes it feel like a system library or
| dependency, even though it's clearly not.
|
| I'll go out on a limb, I suspect a pretty sizable chunk
| (if not an outright majority) of the devs who try to
| build on this will not be thinking about the fact that
| they need access controls at all.
|
| > A SQL-based API isn't inherently less secure than a
| non-SQL-based one if you implement access control, and a
| non-SQL-based API isn't inherently more secure than a
| SQL-based one if you don't implement access control. The
| SQL-ness of an API doesn't change the security picture.
|
| I'm not sure I agree with this either. If I see a dev
| exposing direct query access to a database, my reaction
| is going to be very dependent on whether or not I think
| they're an experienced programmer already. If I know them
| enough to trust them, fine. Otherwise, my assumption is
| that they're probably doing something dangerous. I think
| the access controls that are built into SQL are a lot
| easier to foot-gun, I generally advise devs to build
| wrappers because I think it's generally harder to mess
| them up. Opinion me :shrug:
|
| Regardless, I do think the way OpenAI talks about this
| does matter, I do think their documentation will
| influence how developers use the product, so I think if
| they're going to talk about SQL they should in-code be
| showing examples of how to implement those access
| controls. "We're just providing the API, if developers
| mess it up its their fault" -- I don't know, good APIs
| and good documentation should try to when possible
| provide a "pit of success[0]" for naive developers. In
| particular I think that matters when talking about a
| market segment that is getting a lot of naive VC money
| thrown at it sometimes without a lot of diligence, and
| where those security risks may end up impacting regular
| people.
|
| [0]: https://blog.codinghorror.com/falling-into-the-pit-
| of-succes...
| BoorishBears wrote:
| You don't need to directly run the query it returns, you
| can use that query as a sub-query on a known safe set of
| data and let it fail if someone manages to prompt inject
| their way into looking at other tables/columns.
|
| That way you can support natural language to query without
| sending dozens of functions (which will eat up the context
| window)
| danShumway wrote:
| You can do that (I wouldn't advise it, there are still
| problems that are better solved by building explicit
| functions; but you can use subqueries and it would be
| safer) -- but most developers won't. They'll run the
| query directly. Most developers also will not execute it
| as a readonly query, they'll give the LLM write access to
| the database.
|
| If OpenAI doesn't know that, then I don't know what to
| say, they haven't spent enough time writing documentation
| for general users.
| BoorishBears wrote:
| You can't advise for or against it without a well defined
| problem: for some cases explicit functions won't even be
| an option.
|
| Defining basic CRUD functions for a few basic entities
| will a ton of tokens in schema definitions, and still
| suffers from injection if you want to support querying on
| data that wasn't well defined a-priori, which is a
| problem I've worked on.
|
| Overall if this was one of their example projects I'd be
| disappointed, but it was a snippet in a release note. So
| far their _actual_ example projects have done a fair job
| showing where guardrails in production systems are
| needed, I wouldn 't over-index on this.
| danShumway wrote:
| > You can't advise for or against it without a well
| defined problem: for some cases explicit functions won't
| even be an option.
|
| On average I think I can. I mean, I can't know without
| the exact problem specifications whether or not a
| developer should use `innerHTML`/`eval`. But I can offer
| general advice against it, even though both can be used
| securely. I feel pretty safe saying that exposing SQL
| access directly in an API will _usually_ lead to more
| fragile infrastructure. There are plenty of exceptions of
| course, but there are exceptions to pretty much all
| programming advice. I don 't think it's good for it to be
| one of the first examples they bring up for how to use
| the API.
|
| ----
|
| > Overall if this was one of their example projects I'd
| be disappointed
|
| I have similar complaints about their example code. They
| include the comment:
|
| > # Note: the JSON response from the model may not be
| valid JSON
|
| But they don't actually do schema validation here or
| check anything. Their example project isn't fit to
| deploy. My thought on this is that if every response for
| practically every project needs to have schema validation
| (and I would strongly advise doing schema validation on
| every response), then the sample code should have schema
| validation in it. Their example project should be
| something that could be almost copy-and-pasted.
|
| If that makes the code sample longer, well... that is the
| minimum complexity to build an app on this. The sample
| code should reflect that.
|
| > and still suffers from injection if you want to support
| querying on data that wasn't well defined a-priori
|
| This is a really good point. My response would be that
| they should be expanding on this as well. I'm really
| frustrated that OpenAI's documentation provides (imo)
| basically no really practical/great security advice other
| than "hey, this problem exists, make sure you deal with
| it." But it seems to me like they're already falling over
| on providing good documentation before they even get to
| the point where they can talk seriously about bigger
| security decisions.
| sillysaurusx wrote:
| I was going to say "I look forward to it and think it's
| hilarious," but then I remembered that most victims will be
| people learning to code, not companies. It would really suck to
| suddenly lose your recipe database when you just wanted to
| figure out how this programming stuff worked.
|
| Some kind of "heads up" tagline is probably a good idea, yeah.
| kristiandupont wrote:
| I think the victims will mostly be the users of the software.
| The personal assistant that can handle your calendar and
| emails and all would be able to do real damage.
| irthomasthomas wrote:
| It's a shame they couldn't use yaml, instead. I compared them and
| yaml uses about 20% fewer tokens. However, I can understand
| accuracy, derived from frequency, being more important than token
| budget.
| IshKebab wrote:
| I would imagine JSON is easier for a LLM to understand (and for
| humans!) because it doesn't rely on indentation and confusing
| syntax for lists, strings etc.
| nasir wrote:
| Its a lot more straightforward to use JSON programmatically
| than YAML.
| TeMPOraL wrote:
| It really shouldn't be, though. I.e. not unless you're
| parsing or emitting it ad-hoc, for example by assuming that
| an expression like: "{" + $someKey + ":" +
| $someValue + "}"
|
| produces a valid JSON. It does - sometimes - and then it's
| indeed easier to work with. It'll also blow up in your face.
| Using JSON the right way - via a proper parser and serializer
| - should be identical to using YAML or any other equivalent
| format.
| riwsky wrote:
| Even if the APIs for both were equally simple, modules for
| manipulating json are way more likely to be available in
| the stdlib of whatever language you're using.
| golergka wrote:
| If you are using any kind of type checking instead of blindly
| trusting generated json it's exactly the same amount of work.
| blamy wrote:
| JSON can be minified.
| AdrienBrault wrote:
| I think YAML actually uses more tokens than JSON without
| indents, especially with deep data. For example "," being a
| single token makes JSON quite compact.
|
| You can compare JSON and YAML on
| https://platform.openai.com/tokenizer
| ulrikrasmussen wrote:
| Has anyone tried throwing their backend Swagger at this and made
| ChatGPT perform user story tests?
| rank0 wrote:
| OpenAI integration is going to be a goldmine for criminals in the
| future.
|
| Everyone and their momma is gonna start passing poorly
| validated/sanitized client input to shared sessions of a non-
| deterministic function.
|
| I love the future!
| nextworddev wrote:
| In the "future"?
| zyang wrote:
| Is it possible to fine-tune with custom data to output JSON?
| edwin wrote:
| That's not the current OpenAI recipe. Their expectation is that
| your custom data will be retrieved via a function/plugin and
| then be subsequently processed by a chat model.
|
| Only the older completion models (davinci, curie, babbage, ada)
| are avaialble for fine-tuning.
| jamesmcintyre wrote:
| In the openai blog post they mention "Convert "Who are my top ten
| customers this month?" to an internal API call" but I'm assuming
| they mean gpt will respond with structured json (we define via
| schema in function prompt) that we can use to more easily
| programatically make that api call?
|
| I could be confused but I'm interpreting this function calling as
| "a way to define structured input and selection of function and
| then structured output" but not the actual ability to send it
| arbitrary code to execute.
|
| Still amazing, just wanting to see if I'm wrong on this.
| williamcotton wrote:
| This does not execute code!
| jamesmcintyre wrote:
| Ok, yea this makes sense. Also for others curious of the flow
| here's a video walkthrough I just skimmed through:
| https://www.youtube.com/watch?v=91VVM6MNVlk
| smallerfish wrote:
| I will experiment with this at the weekend. Once thing I found
| useful with supplying a json schema in the prompt was that I
| could supply inline comments and tell it when to leave a field
| null, etc. I found that much more reliable than describing these
| nuances elsewhere in the prompt. Presumably I can't do this with
| functions, but maybe I'll be able to work around it in the prompt
| (particularly now that I have more room to play with.)
| loughnane wrote:
| Just this morning I wrote a JSON object. I told GPT to turn it
| into a schema. I tweaked that and then gave a list of terms for
| which I wanted GPT to populate the schema accordingly.
|
| It worked pretty well without any functions, but I did feel like
| I was missing something because I was ready to be explicit and
| there wasn't any way for me to tell that to GPT.
|
| I look forward to trying this out.
| mritchie712 wrote:
| Glad we didn't get to far into adopting something like
| Guardrails. This sort of kills it's main value prop for OpenAI.
|
| https://shreyar.github.io/guardrails/
| Blahah wrote:
| Luckily it's for LLMs, not openai
| blamy wrote:
| Guardrails is an awesome project and will continue to be even
| after this.
| swyx wrote:
| i mean only at the most superficial level. she has a ton of
| other validators that arent superceded (eg SQL is validated by
| branching the database - we discussed on our pod
| https://www.latent.space/p/guaranteed-quality-and-structure)
| mritchie712 wrote:
| yeah, listened to the pod (that's how I found out about
| guardrails!).
|
| fair point, I should have said: "value prop for our use
| case"... the thing I was most interested in was how well
| Guardrails structured output.
| swyx wrote:
| haha excellent. i was quite impressed by her and the vision
| for guardrails. thanks for listening!
| Kiro wrote:
| Can I use this to make it reliably output code (say JavaScript)?
| I haven't managed to do it with just prompt engineering as it
| will still add explanations, apologies and do other unwanted
| things like splitting the code into two files as markdown.
| minimaxir wrote:
| Here's a demo of some system prompt engineering which resulted
| in better results for the older ChatGPT:
| https://github.com/minimaxir/simpleaichat/blob/main/examples...
|
| Coincidentially, the new gpt-3.5-turbo-0613 model also has
| better system prompt guidance: for the demo above and some
| further prompt tweaking, it's possible to get ChatGPT to output
| code super reliably.
| williamcotton wrote:
| Here's an approach to return just JavaScript:
|
| https://github.com/williamcotton/transynthetical-engine
|
| The key is the addition of few-shot exemplars.
| sanxiyn wrote:
| Not this, but using the token selection restriction approach,
| you can let LLM produce output that conforms to arbitrary
| formal grammar completely reliably. JavaScript, Python,
| whatever.
| Xen9 wrote:
| Marvin Minsky was so damn far ahead of his time with Society of
| Mind.
|
| Engineering of cognitively advanced multiagent systems will
| become the area of research of this century / multiple decades.
|
| GPT-GPT > GPT-API in terms of power.
|
| The space of possible combinations of GPT multiagents goes beyond
| imagination since even GPT-4 goes so.
|
| Multiagent systems are best modeled with signal theory, graph
| theory and cognitive science.
|
| Of course "programming" will also play a role, in sense of
| abstractions and creation of systems of / for thought.
|
| Signal theory will be a significant approach for thinking about
| embedded agency.
|
| Complex multiagent systems approach us.
| SanderNL wrote:
| Makes me think of the Freud/Jungian notions of personas in us
| that are in various degrees semi-autonomously looking out for
| themselves. The "angry" agent, the "child" agent, so on.
| edwin wrote:
| For those who want to test out the LLM as API idea, we are
| building a turnkey prompt to API product. Here's Simon's recipe
| maker deployed in a minute:
| https://preview.promptjoy.com/apis/1AgCy9 . Public preview to
| make and test your own API: https://preview.promptjoy.com
| wonderfuly wrote:
| I own this domain: prompts.run Do you wanna it?
| yonom wrote:
| This is cool! Are you using one-shot learning under the hood
| with a user provided example?
| edwin wrote:
| BTW: Here's a more performant version (fewer tokens)
| https://preview.promptjoy.com/apis/jNqCA2 that uses a smaller
| example but will still generate pretty good results.
| sudb wrote:
| This is still pretty fast - impressive! Are there any
| tricks you're doing to speed things up?
| edwin wrote:
| Thanks. We find few-shot learning to be more effective
| overall. So we are generating additional examples from the
| provided example.
| abhpro wrote:
| This is really cool, I had a similar idea but didn't build it.
| I was also thinking a user could take these different prompts
| (I called them tasks) that anyone could create, and then
| connect them together like a node graph or visual programming
| interface, with some Chat-GPT middleware that resolves the
| outputs to inputs.
| edelans wrote:
| Congrats on the first-time user experience, I could experiment
| with your API in a few seconds, and the product is sleek!
| darepublic wrote:
| I have been using gpt4 to translate natural language to JSON
| already. And on v4 ( not v3) it hasn't returned any malformed
| JSON iirc
| yonom wrote:
| - if the only reason you're using v4 over v3.5 is to generate
| JSON, you can now use this API and downgrade for faster and
| cheaper API calls. - malicious user input may break your json
| (by asking GPT to include comments around the JSON, as another
| user suggested); this may or may not be an issue (e. g. if one
| user can influence other users' experience)
| nocsi wrote:
| What if you ask it to include comments in the JSON explaining
| its choices
| courseofaction wrote:
| Nice to have an endpoint which takes care of this. I've been
| doing this manually, it's a fairly simple process:
|
| * Add "Output your response in json format, with the fields 'x',
| which indicates 'x_explanation', 'z', which indicates
| 'z_explanation' (...)" etc. GPT-4 does this fairly reliably.
|
| * Validate the response, repeat if malformed.
|
| * Bam, you've got a json.
|
| I wonder if they've implemented this endpoint with validation and
| carefully crafted prompts on the base model, or if this is
| specifically fine-tuned.
| 037 wrote:
| It appears to be fine-tuning:
|
| "These models have been fine-tuned to both detect when a
| function needs to be called (depending on the user's input) and
| to respond with JSON that adheres to the function signature."
|
| https://openai.com/blog/function-calling-and-other-api-updat...
| wskish wrote:
| here is code (with several examples) that takes it a couple steps
| further by validating the output json and pydantic model and
| providing feedback to the llm model when it gets either of those
| wrong:
|
| https://github.com/jiggy-ai/pydantic-chatcompletion/blob/mas...
| andsoitis wrote:
| having gpt-4 as a dependency for your product or business
| seems... shortsighted
| l5870uoo9y wrote:
| This was technically possible before. I think the approach used
| by many - myself included - is to simply embed results in a
| markdown code block and the match it with regex pattern. Then you
| just need to phrase the prompt to generate the desired output.
|
| This is an example of that generating the arguments for the
| MongoDB's `db.runCommand()` function:
| https://aihelperbot.com/snippets/cliwx7sr80000jj0finjl46cp
| sublinear wrote:
| > The process is simple enough that you can let non-technical
| people build something like this via a no-code interface. No-code
| tools can leverage this to let their users define "backend"
| functionality.
|
| Early prototypes of software can use simple prompts like this one
| to become interactive. Running an LLM every time someone clicks
| on a button is expensive and slow in production, but _probably
| still ~10x cheaper to produce than code._
|
| Hah wow... no. Definitely not.
| lasermatts wrote:
| I thought GPT-4 was doing a pretty good job at outputting JSON
| (for some of the toy problems I've given it like some of my
| gardening projects.) Interesting to see this hit the very top of
| HN
| social_ism wrote:
| [dead]
| thorum wrote:
| The JSON schema not counting toward token usage is huge, that
| will really help reduce costs.
| minimaxir wrote:
| That is up in the air and needs more testing. Field
| descriptions, for example, are important but extraneous input
| that would be tokenized and count in the costs.
|
| At the least for ChatGPT, input token costs were cut by 25% so
| it evens out.
| stavros wrote:
| > Under the hood, functions are injected into the system
| message in a syntax the model has been trained on. This means
| functions count against the model's context limit and are
| billed as input tokens. If running into context limits, we
| suggest limiting the number of functions or the length of
| documentation you provide for function parameters.
| yonom wrote:
| I believe functions do count in some way toward the token
| usage; but it seems to be in a more efficient way than pasting
| raw JSON schemas into the prompt. Nevertheless, the token usage
| seems to be far lower than previous alternatives, which is
| awesome!
| blamy wrote:
| But it does count toward token usage. And they picked JSON
| schema which is like 6x more verbose than typescript for
| defining the shape of json.
| adultSwim wrote:
| _Running an LLM every time someone clicks on a button is
| expensive and slow in production, but probably still ~10x cheaper
| to produce than code._
| edwin wrote:
| New techniques like semantic caching will help. This is the
| modern era's version of building a performant social graph.
| daralthus wrote:
| What's semantic caching?
| edwin wrote:
| With LLMs, the inputs are highly variable so exact match
| caching is generally less useful. Semantic caching groups
| similar inputs and returns relevant results accordingly. So
| {"dish":"spaghetti bolognese"} and {"dish":"spaghetti with
| meat sauce"} could return the same cached result.
| m3kw9 wrote:
| Or store as sentence embedding and calculate the vector
| distance, but creates many edge cases
| minimaxir wrote:
| After reading the docs for the new ChatGPT function calling
| yesterday, it's structured and/or typed data for GPT input or
| output that's the key feature of these new models. The ReAct flow
| of tool selection that it provides is secondary.
|
| As this post notes, you don't even need to the full flow of
| passing a function result back to the model: getting structured
| data from ChatGPT in itself has a lot of fun and practical use
| cases. You could coax previous versions of ChatGPT to "output
| results as JSON" with a system prompt but in practice results are
| mixed, although even with this finetuned model the docs warn that
| there still could be parsing errors.
|
| OpenAI's demo for function calling is not a Hello World, to put
| it mildly: https://github.com/openai/openai-
| cookbook/blob/main/examples...
| tornato7 wrote:
| IIRC, there's a way to "force" LLMs to output proper JSON by
| adding some logic to the top token selection. I.e. in the
| randomness function (which OpenAI calls temperature) you'd
| never choose a next token that results in broken JSON. The only
| reason it wouldn't would be if the output exceeds the token
| limit. I wonder if OpenAI is doing something like this.
| ManuelKiessling wrote:
| Note that you don't necessarily need to have the AI output
| any JSON at all -- simply have it answer when being asked for
| the value to a specific JSON key, and handle the JSON
| structure part in your hallucinations-free own code:
| https://github.com/manuelkiessling/php-ai-tool-bridge
| lyjackal wrote:
| Would be nice if you could send a back and forth
| interaction for each key. This approach turns into lots of
| requests that reapply the entire context and ends up slow.
| I wish i could just send a Microsoft guidance template
| program, and process that in a single pass.
| naiv wrote:
| Thanks for sharing!
| woodrowbarlow wrote:
| the linked article hypothesizes:
|
| > I assume OpenAI's implementation works conceptually similar
| to jsonformer, where the token selection algorithm is changed
| from "choose the token with the highest logit" to "choose the
| token with the highest logit which is valid for the schema".
| ttul wrote:
| I think the problem is that tokens are not characters. So
| even if you had access to a JSON parser state that could tell
| you whether or not a given character is valid as the next
| character, I am not sure how you would translate that into
| tokens to apply the logit biases appropriately. There would
| be a great deal of computation required at each step to scan
| the parser state and generate the list of prohibited or
| allowable tokens.
|
| But if one could pull this off, it would be super cool.
| Similar to how Microsoft's guidance module uses the
| logit_bias parameter to force the model to choose between a
| set of available options.
| yunyu wrote:
| You simply sample tokens starting with the allowed
| characters and truncate if needed. It's pretty efficient,
| there's an implementation here:
| https://github.com/1rgs/jsonformer
| DougBTX wrote:
| This is the best implementation I've seen, but only for
| Hugging Face models: https://github.com/1rgs/jsonformer
| senko wrote:
| It would seem not, as the official documentation mentions the
| arguments may be hallucinated or _be a malformed JSON_.
|
| (except if the meaning is the JSON syntax is valid but may
| not conform to the schema, but they're unclear on that).
| sanxiyn wrote:
| For various reasons, token selection may be implemented as
| upweighting/downweighting instead of outright ban of
| invalid tokens. (Maybe it helps training?) Then the model
| could generate malformed JSON. I think it is premature to
| infer from "can generate malformed JSON" that OpenAI is not
| using token selection restriction.
| sanxiyn wrote:
| Note that this (token selection restriction) is even
| available on OpenAI API as logit_bias.
| newhouseb wrote:
| But only for the whole generation. So if you want to
| constrain things one token at a time (as you would to force
| things to follow a grammar) you have to make fresh calls
| and only request one token which makes things more or less
| impractical if you want true guarantees. A few months ago I
| built this anyway to suss out how much more expensive it
| was [1]
|
| [1] https://github.com/newhouseb/clownfish#so-how-do-i-use-
| this-...
| have_faith wrote:
| How would a tweaked temp enforce a non broken output exactly?
| isoprophlex wrote:
| Not traditional temperature, maybe the parent worded it
| somewhat obtusely. Anyway, to disambiguate...
|
| I think it works something like this: You let something
| akin to a json parser run with the output sampler. First
| token must be either '{' or '['; then if you see [ has the
| highest probability, you select that. Ignore all other
| tokens, even those with high probability.
|
| Second token must be ... and so on and so on.
|
| Guarantee for non-broken (or at least parseable) json
| sanxiyn wrote:
| It's not temperature, but sampling. Output of LLM is
| probabilistic distribution over tokens. To get concrete
| tokens, you sample from that distribution. Unfortunately,
| OpenAI API does not expose the distribution. You only get
| the sampled tokens.
|
| As an example, on the link JSON schema is defined such that
| recipe ingredient unit is one of
| grams/ml/cups/pieces/teaspoons. LLM may output the
| distribution grams(30%), cups(30%), pounds(40%). Sampling
| the best token "pounds" would generate an invalid document.
| Instead, you can use the schema to filter tokens and sample
| from the filtered distribution, which is grams(50%),
| cups(50%).
| H8crilA wrote:
| That SQL example is going to result in a catastrophe somewhere
| when someone uses it in their project. It is encouraging
| something very dangerous when allowed to run on untrusted
| inputs.
| behnamoh wrote:
| What's the implication of this new change for Microsoft
| Guidance, LMQL, Langchain, etc.? It looks like much of their
| functionality (controlling model output) just became obsolete.
| Am I missing something?
| [deleted]
| lbeurerkellner wrote:
| If anything this removes a major roadblock for
| libraries/languages that want to employ LLM calls as a
| primitive, no? Although, I fear the vendor lock-in
| intensifies here, also given how restrictive and specific the
| Chat API.
|
| Either way, as part of the LMQL team, I am actually pretty
| excited about this, also with respect to what we want to
| build going forward. This makes language model programming
| much easier.
| koboll wrote:
| `Although, I fear the vendor lock-in intensifies here, also
| given how restrictive and specific the Chat API.`
|
| Eh, would be pretty easy to write a wrapper that takes a
| functions-like JSON Schema object and interpolates it into
| a traditional "You MUST return ONLY JSON in the following
| format:" prompt snippet.
| londons_explore wrote:
| > Although, I fear the vendor lock-in intensifies here,
|
| The openAI API is super simple - any other vendor is free
| to copy it, and I'm sure many will.
| neuronexmachina wrote:
| Langchain added support for `function_call` args yesterday:
|
| * https://github.com/hwchase17/langchain/pull/6099/files
|
| * https://github.com/hwchase17/langchain/issues/6104
|
| IMHO, this should make Langchain much easier and less chaotic
| to use.
| gawi wrote:
| It's only been added to the OpenAI interface. Function
| calling is really useful when used with agents. To include
| that to agents would require some redesign as the tool
| instructions should be removed from the prompt templates in
| favor of function definitions in the API request. The
| response parsing code would also be affected.
|
| I just hope they won't come up with yet another agent type.
| neuronexmachina wrote:
| Like this? https://github.com/hwchase17/langchain/blob/ma
| ster/langchain...
| gawi wrote:
| LangChain is a perpetual hackathon.
| arbuge wrote:
| They have something closer to a simple Hello World example
| here:
|
| https://platform.openai.com/docs/guides/gpt/function-calling
|
| That example needs a bit of work I think. In Step 3, they're
| not really using the returned function_name; they're just
| assuming it's the only function that's been defined, which I
| guess is equivalent for this simple example with just one
| function but less instructive. In Step 4, I believe they should
| also have sent the function definition block again a second
| time since model calls in the API are memory-less and
| independent. They didn't, although the model appears to guess
| what's needed anyway in this case.
| m3kw9 wrote:
| It works pretty good. You define a few "function" and enter a
| description on what it does, when user prompts, it will
| understand the prompt and tell you which likely "function" to
| use, which is just the function name. I feel like this is a new
| way to program, a sort of fuzzy logic type of programming
| Sai_ wrote:
| > fuzzy logic
|
| Yes and no. While the choice of which function to call is
| dependent on an llm, ultimately, you control the function
| itself whose output is deterministic.
|
| Even today, given an api, people can choose to call or not call
| based on some factor. We don't call this fuzzy logic. E.g.,
| people can decide to sell or buy stock through an api based on
| some internal calculations - doesn't make the system "fuzzy".
| m3kw9 wrote:
| If you feed that result into another io box you may or may
| not know if that is the correct answer, which may need some
| sort of error detection. I think this is going to be majority
| of the use cases
| Sai_ wrote:
| Hm, I see what you mean. Afaict, only the decision to call
| or not call a function is up to the model (fuzzy). Once it
| decides to call the function, it generates mostly correct
| JSON based on your schema and returns that to you as is
| (not very fuzzy).
|
| It'll be interesting to test APIs which accept user inputs.
| Depending on how ChatGPT populates the JSON, the API could
| be required to understand/interpret/respond to lots of
| variability in inputs.
| m3kw9 wrote:
| Yeah I've tested, you should use the curl example they
| gave as you can test instantly pasting it into your
| terminal. The description of the functions is prompt
| engineering in addition to the original system prompt,
| need to test the dependency more, it's so new.
| jonplackett wrote:
| This is useful, but for me at least, GPT-4 is unusable because it
| sometimes takes 30 seconds + to reply to even basic queries.
| m3kw9 wrote:
| Also the rate limit is pretty bad if you want to release any
| type of app
| jiggawatts wrote:
| More importantly: there's a waiting list.
|
| Also, if you want to use both the ChatGPT web app _and_ the
| API, you 'll be billed for _both_ separately. They really
| should be unified and billed under a single account. The
| difference is literally just whether there 's a "web UI" on
| top of the API... or not.
| emilsedgh wrote:
| Building agents that use advanced API's was not really practical
| until now. Things like Langchain's Structured Agents worked
| somewhat reliably, but due to the massive token count it was so
| slow, the experience was _never_ going to be useful.
|
| Due to this, the performance in which our agent processes results
| has improved 5-6 times and it does actually do a pretty good job
| of keeping the schema.
|
| One problem that is not resolved yet is that it still
| hallucinates a lot of attributes. For example we have tool that
| allows it to create contacts in user's CRM. I ask it to:
|
| "Create contacts for top 3 Barcelona players:.
|
| It creates an structure like this"
|
| 1. Lionel Messi - Email: lionel.messi@barcelona.com - Phone
| Number: +1234567890 - Tags: Player, Barcelona
|
| 2. Gerard Pique - Email: gerard.pique@barcelona.com - Phone
| Number: +1234567891 - Tags: Player, Barcelona
|
| 3. Marc-Andre ter Stegen - Email: marc-terstegen@barcelona.com -
| Phone Number: +1234567892 - Tags: Player, Barcelona
|
| And you can see it hallucinated email addresses and phone
| numbers.
| 037 wrote:
| I would never rely on an LLM as a source of such information,
| just as I wouldn't trust the general knowledge of a human being
| used as a database. Does your workflow include a step for
| information search? With the new json features, it should be
| easy to instruct it to perform a search or directly feed it the
| right pages to parse.
| pluijzer wrote:
| ChatGPT can be usefully for many things, but you should really,
| not use it if you want to retrieve factual data. This might
| partly be resolved by querying the internet like bing does but
| purely on the language model side these hallucinations are just
| an unavoidable part of it.
| Spivak wrote:
| Yep, it's _always_ _always_ write code / query / function /
| whatever you need that you would parse and retrieve the data
| from an external system.
| arsdragonfly wrote:
| [dead]
| dang wrote:
| Recent and related:
|
| _Function calling and other API updates_ -
| https://news.ycombinator.com/item?id=36313348 - June 2023 (154
| comments)
| minimaxir wrote:
| IMO this isn't a dupe and shouldn't be penalized as a result.
| dang wrote:
| It's certainly not a dupe. It looks like a follow-up though.
| No?
| minimaxir wrote:
| More a very timely but practical demo.
| dang wrote:
| Ok, thanks!
| EGreg wrote:
| Actually I'm looking to take GPT-4 output and create file formats
| like keynote presentations, or pptx. Is that currently possible
| with some tools?
| yonom wrote:
| I would recommend creating a simplified JSON schema for the
| slides (say, presentation is an array of slides, each slide has
| a title, body, optional image, optional diagram, each diagram
| is one of pie, table, ... Then use a library to generate the
| pptx file from the content generated.
| EGreg wrote:
| Library? What library?
|
| It seems to me that a Transformer should excel at
| Transforming, say, text into pptx or pdf or HTML with CSS
| etc.
|
| Why don't they train it on that? So I don't have to sit there
| with manually written libraries. It can easily transform HTML
| to XML or text bullet points so why not the other formats?
| yonom wrote:
| I don't think the name "Transformer" is meant in the sense
| of "transforming between file formats".
|
| My intuition is that LLMs tend to be good at things human
| brains are good at (e.g. reasoning), and bad at things
| human brains are bad at (e.g. math, writing pptx binary
| files from scratch, ...).
|
| Eventually, we might get LLMs that can open PowerPoint and
| quickly design the whole presentation using a virtual mouse
| and keyboard but we're not there yet.
| EGreg wrote:
| It's just XML They can produce HTML and transform python
| into php etc.
|
| So why not? It's easy for them no?
| stevenhuang wrote:
| apparently pandoc also supports pptx
|
| so you can tell GPT4 to output markdown, then use pandoc to
| convert that markdown to pptx or pdf.
| edwin wrote:
| Here you go: https://preview.promptjoy.com/apis/m7oCyL
| amolgupta wrote:
| I pass a kotlin data class and ask chatGPT to return json which
| can be parsed by that class. Reduces errors with date-time
| parsing and other formatting issues and takes up lesser tokens
| than the approach in the article.
___________________________________________________________________
(page generated 2023-06-15 23:03 UTC)