[HN Gopher] Native JSON Output from GPT-4
       ___________________________________________________________________
        
       Native JSON Output from GPT-4
        
       Author : yonom
       Score  : 578 points
       Date   : 2023-06-14 19:07 UTC (1 days ago)
        
 (HTM) web link (yonom.substack.com)
 (TXT) w3m dump (yonom.substack.com)
        
       | aecorredor wrote:
       | Newbie in machine learning here. It's crazy that this is the top
       | post just today. I've been doing the intro to deep learning
       | course from MIT this week, mainly because I have a ton of JSON
       | files that are already classified, and want to train a model that
       | can generate new JSON data by taking classification tags as
       | input.
       | 
       | So naturally this post is exciting. My main unknown right now is
       | figuring out which model to train my data on. An RNN, a GAN, a
       | diffusion model?
        
         | ilaksh wrote:
         | Did you read the article? To do it with OpenAI you would just
         | put a few output examples in the prompt and then give it a
         | function that takes the class and the output parameters
         | correspond to the JSON format you want, or just a string
         | containing JSON.
         | 
         | You could also fine tuned an LLM like Falcon-7b but probably
         | not necessary and nothing to do with OpenAI.
         | 
         | You might also look into the OpenAI Embedding API as a third
         | option.
         | 
         | I would try the first option though.
        
       | chaxor wrote:
       | Is there a decent way of converting to a structure with a very
       | constrained vocabulary? For example, given some input text,
       | converting it to something like {"OID-189": "QQID-378",
       | "OID-478":"QQID-678"}. Where OID and QQID dictionaries can be
       | e.g. millions of different items defined by a description. The
       | rules for mapping could be essentially what looks closest in
       | semantic space to the descriptions given in a dictionary.
       | 
       | I know this should be able to be solvable by local LLMs and bert
       | cosine similarity (it isn't exactly, but it's a start on the
       | idea), but is there a way to do this with decoder models rather
       | than encoder models with other logic?
        
         | jiggawatts wrote:
         | You can train custom GPT 3 models, and Azure now has vector
         | database integration for GPT-based models in the cloud. You can
         | feed it the data, and ask it for the embedding lookup, etc...
         | 
         | You can also host a vector database yourself and fill it up
         | with the embeddings from the OpenAI GPT 3 API.
        
           | chaxor wrote:
           | Unfortunately this doesn't really work, as the model is not
           | limited in it's decoding vocabulary.
           | 
           | Does anyone have other suggestions that may work in this
           | space?
        
       | srameshc wrote:
       | I've used GCP Vertex AI for a specific task and the prompt was to
       | generate a JSON response with keys specified and it does generate
       | the result as JSON with said keys.
        
         | twelfthnight wrote:
         | Issue is that's it's not guaranteed, unlike this new openai
         | feature. Personally, Ive found Vertex AI's json output to be
         | not so great, it often uses single quotes in my experience. But
         | maybe you have figured out the right prompts? I'd be interested
         | what you use if so.
        
       | khazhoux wrote:
       | I'm trying to experiment with the API but the response time is
       | always in the 15-25second range. How are people getting any
       | interesting work done with it?
       | 
       | I see others on the OpenAPI dev forum complaining about this too,
       | but no resolution.
        
       | 037 wrote:
       | I'm wondering if introducing a system message like "convert the
       | resulting json to yaml and return the yaml only" would adversely
       | affect the optimization done for these models. The reason is that
       | yaml uses significantly fewer tokens compared to json. For the
       | output, where data type specification or adding comments may not
       | be necessary, this could be beneficial. From my understanding,
       | specifying functions in json now uses fewer tokens, but I believe
       | the response still consumes the usual amount of tokens.
        
         | lbeurerkellner wrote:
         | I think one should not underestimate the impact on downstream
         | performance the output format can have. From a modelling
         | perspective it is unclear whether asking/fine-tuning the model
         | to generate JSON (or YAML) output is really lossless with
         | respect to the raw reasoning powers of the model (e.g. it may
         | perform worse on tasks when asked/trained to always respond in
         | JSON).
         | 
         | I am sure they ran tests on this internally, but I wonder what
         | the concrete effects are, especially comparing different output
         | formats like JSON, YAML, different function calling conventions
         | and/or forms of tool discovery.
        
         | gregw134 wrote:
         | That's what I'm doing. I ask ChatGPT to return inline yaml (no
         | wasting tokens on line breaks), then I parse the yaml output
         | into JSON once I receive it. A bit awkward but it cuts costs in
         | half.
        
       | bel423 wrote:
       | Did people really struggle with getting JSON outputs from GPT4.
       | You can literally do it zero shot by just saying match this
       | typescript type.
       | 
       | GPT3.5 would output perfect JSON with a single example.
       | 
       | I have no idea why people are talking about this like it's a new
       | development.
        
         | brolumir wrote:
         | Unfortunately, in practice that works only _most of the time_.
         | At least in our experience (and the article says something
         | similar) sometimes ChatGPT would return something completely
         | different when JSON-formatted response would be expected.
        
           | tornato7 wrote:
           | In my experience if you set the temperature to zero it works
           | 99.9% of the time, and then you can just add retry logic for
           | the remaining 0.1%
        
           | blamy wrote:
           | I've been using the same prompts for months and have never
           | seen this happen on 3.5-turbo let alone 4.
           | 
           | https://gist.github.com/BLamy/244eec016beb9ad8ed48cf61fd2054.
           | ..
        
       | imranq wrote:
       | Wouldnt this be possible with a solution like Guidance where you
       | have a pre structured JSON format ready to go and all you need is
       | text: https://github.com/microsoft/guidance
        
       | swyx wrote:
       | i think people are underestimating the potential here for agents
       | building - it is now a lot easier for GPT4 to call other models,
       | or itself. while i was taking notes for our emergency pod
       | yesterday (https://www.latent.space/p/function-agents) we had
       | this interesting debate with Simon Willison on just how many
       | functions will be supplied to this API. Simon thinks it will be
       | "deep" rather than "wide" - eg a few functions that do many
       | things, rather than many functions that do few things. I think i
       | agree.
       | 
       | you can now trivially make GPT4 decide whether to call itself
       | again, or to proceed to the next stage. it feels like the first
       | XOR circuit from which we can compose a "transistor", from which
       | we can compose a new kind of CPU.
        
         | quickthrower2 wrote:
         | The first transistors were slow, and it seems this "GPT3/4
         | calling itself" stuff is quite slow. GPT3/4 as a direct chat is
         | about as slow as I can take. Once this gets sped up.
         | 
         | I am sure it will, as you can scale out, scale up and build
         | more efficient code and build more efficient architectures and
         | "tool for the job" different parts of the process.
         | 
         | The problem now (using auto gpt, for example) is accuracy is
         | bad, so you need human feedback and intervention AND it is
         | slow. Take away the slow, or the needing human intervention and
         | this can be very powerful.
         | 
         | I dream of the breakthrough "shitty old laptop is all you need"
         | paper where they figure out how to do amazing stuff with a 1Gb
         | of space on a spinny disk and 1Gb RAM and a CPU.
        
         | jillesvangurp wrote:
         | Exactly, we humans can use specialized models and traditional
         | tool APIs and models and orchestrate the use of all these
         | without understanding how these things work in detail.
         | 
         | To do accounting, GPT 4 (or future models) doesn't have to know
         | how to calculate. All it needs to know how to interface with
         | tools like calculators, spreadsheets, etc. and parse their
         | outputs. Every script, program, etc. becomes a thing that has
         | such an API. A lot what we humans do to solve problems is
         | breaking down big problems into problems where we know the
         | solution already.
         | 
         | Real life tool interfaces are messy and optimized for humans
         | with their limited language and cognitive skills. Ironically,
         | that means they are relatively easy to figure out for AI
         | language models. Relative to human language the grammar of
         | these tool "languages" is more regular and the syntax less
         | ambiguous and complicated. Which is why gpt 3 and 4 are
         | reasonably proficient with even some more obscure programming
         | languages and in the use of various frameworks; including some
         | very obscure ones.
         | 
         | Given a lot of these tools with machine accessible APIs with
         | some sort of description or documentation, figuring out how to
         | call these things is relatively straightforward for a language
         | model. The rest is just coming up with a high level plan and
         | then executing it. Which amounts to generating some sort of
         | script that does this. As soon as you have that, that in itself
         | becomes a tool that may be used later. So, it can get better
         | over time. Especially once it starts incorporating feedback
         | about the quality of its results. It would be able to run mini
         | experiments and run its own QA on its own output as well.
        
         | jarulraj wrote:
         | Interesting observation, @swyx. There seems to be a connection
         | to transitive closure in SQL queries, where the output of the
         | query is fed as the input to the query in the next iteration
         | [1]. We are thinking about how to best support such recursive
         | functions in EvaDB [2].
         | 
         | [1] http://dwhoman.com/blog/sql-transitive-closure.html [2]
         | https://evadb.readthedocs.io/en/stable/source/tutorials/11-s...
        
         | boringuser2 wrote:
         | Do you people always have to overhype this shit?
        
           | throwuwu wrote:
           | What's your problem? There's nothing overhyped about that
           | comment. People, including me, _are_ building complex agents
           | that can execute multi stage prompts and perform complex
           | tasks. Comparing these first models to a basic unit of logic
           | is more than fair given how much more capable they are. Do
           | you just have an axe to grind?
        
             | boringuser2 wrote:
             | [flagged]
        
               | pyinstallwoes wrote:
               | How is it inappropriate? How is it not building?
        
               | [deleted]
        
           | delhanty wrote:
           | Do _you_ have to be nasty?
           | 
           | That's a person you're replying to with feelings, so why not
           | default to being kind in comments as per HN guidelines?
           | 
           | As it happens, swyx has built notable AI related things, for
           | example smol-developer
           | 
           | https://twitter.com/swyx/status/1657892220492738560
           | 
           | and it would be nice to be able to read his and other
           | perspectives without having to read shallow, mean, dismissive
           | replies such as yours.
        
             | boringuser2 wrote:
             | [flagged]
        
               | dang wrote:
               | Hey, I understand the frustration (both the frustration
               | of endless links on an over-hyped topic, and the
               | frustration of getting scolded by another user when
               | expressing yourself) - but it really would be good if
               | you'd post more in the intended spirit of this site
               | (https://news.ycombinator.com/newsguidelines.html).
               | 
               | People sometimes misunderstand this, so I'd like to
               | explain a bit. (It probably won't help, but it might, and
               | I don't like flagging or banning accounts without trying
               | to persuade people first if possible.)
               | 
               | We don't ask people to be kind, post thoughtfully, not
               | call names, not flame, etc., out of nannyism or some
               | moral thing we're trying to impose. That wouldn't feel
               | right and I wouldn't want to be under Mary Poppins's
               | umbrella either.
               | 
               | The reason is more like an engineering problem: we're
               | trying to optimize for one specific thing (https://hn.alg
               | olia.com/?dateRange=all&page=0&prefix=true&sor...) and we
               | can't do that if people don't respect certain
               | constraints. The constraints are to prevent the forum
               | from burning itself to a crisp, which is where the arrow
               | of internet entropy will take us if we don't expend
               | energy to stave it off.
               | 
               | It probably doesn't feel like you're doing anything
               | particularly wrong, but there's a cognitive bias where
               | everyone underestimates the damage they're causing (by
               | say 10x) and overestimates the damage others are causing
               | (by say 10x) and that compounds into a major bias where
               | everyone feels like everyone else is the problem. We need
               | a way out of that dynamic if we're to have any hope of
               | keeping this place interesting. As you probably realize,
               | HN is forever on the threshold of caving into a pit. We
               | need you to help nudge it back from that, not push it
               | over.
               | 
               | Of course you're free to say "what do I care if HN burns
               | itself to a crisp, fuck you all" but I'd argue you
               | shouldn't take that nihilistic position because it isn't
               | in your own interests. HN may be annoying at times, but
               | it's interesting enough for you to spend time here--
               | otherwise you wouldn't be reading the site and posting to
               | it. Why not contribute to making it _more_ interesting
               | rather than destroying it for yourself and everyone else?
               | (I don 't mean that you're intentionally destroying it--
               | but the way you've been posting is unintentionally
               | contributing to that outcome.)
               | 
               | I'm sure you wouldn't drop lit matches in a dry forest,
               | or dump motor oil in a mountain lake, trample flower
               | gardens, or litter in a city park, for much the same
               | reason. It's in your own interest to practice the same
               | care for the commons here. Thanks for listening.
        
               | boringuser2 wrote:
               | Thanks for the effort of explanation.
               | 
               | If someone were egregiously out of line, typically, I
               | feel community sentiment reflects this.
               | 
               | Personally, I feel your assessment of cognitive bias at
               | play is way off base. I don't think it's a valid
               | comparison to claim that someone is causing "damage" by
               | merely expressing distaste. That's a common tool that
               | humans use for social feedback. Is cutting off the
               | ability for genuine social feedback or adjustment and
               | forcing people to be saccharine out of fear of reprisal
               | from the top really an _optimal_ solution to an
               | engineering problem? It seems more like a simulacrum of
               | an HR department where the guillotine is more real: your
               | job and life rather than merely your ability to share
               | your thoughts on a corner of the Internet.
               | 
               | Think about the engineering problem you find yourself in
               | with this state of affairs: something very similar to the
               | kind of content you might find on LinkedIn, a sort of
               | circular back-patting engine devoid of real challenge and
               | grit because of the aforementioned guillotine under which
               | all participants hang.
               | 
               | And, quite frankly, you _do_ see the effects of this in
               | precisely the post in this initial exchange: hyperbole
               | and lack of deep critical assessment are artificially
               | inflated. This isn 't a coincidence: this has been
               | cultured very specifically by the available growing
               | conditions and the starter used -- saccharine hall
               | monitors that fold like cheap suits (e.g. very poorly,
               | lots of creases) when the lowest level of social
               | challenge is raised fo their ideas.
               | 
               | You know what it really feels like? A Silicon Valley
               | reconstruction of all the bad things about a workplace,
               | not a genuine forum for debate and intellectual
               | exploration. If you want to find a place to model such
               | behavior, the Greeks already have you figured out - how
               | do you think Diogenes would feel about human resources?
               | 
               | That being said, I appreciate the empathy.
               | 
               | Obviously, I feel a bit like a guy Tony Soprano beat up
               | and being forced to apologize afterwards to him for
               | bruising his knuckles.
        
               | dang wrote:
               | Not to belabor the point but from my perspective you've
               | illustrated the point about cognitive bias: it always
               | feels like the other person started it and did worse ("I
               | feel a bit like a guy Tony Soprano beat up and being
               | forced to apologize afterwards to him for bruising his
               | knuckles") and it always feels like one was merely
               | defending oneself reasonably ("merely expressing
               | distaste"). This is the asymmetry I'm talking about.
               | 
               | As you can imagine, mods get this kind of feedback all
               | the time from all angles. The basic learning it adds up
               | to is that everybody always feels this way. Therefore
               | those feelings are not a reliable compass to navigate by.
               | 
               | This is not a criticism--I appreciate your reply!
               | 
               | Edit:
               | 
               | > _forcing people to be saccharine [...] like a
               | simulacrum of an HR department_
               | 
               | We definitely don't want that and the site guidelines
               | don't call for that. There is tons of room to make your
               | substantive points thoughtfully without being saccharine.
               | It can take a little bit of reflective work to find that
               | room, though, just because we (humans in general) tend to
               | get locked into binary oppositions.
               | 
               | The best principle to go by is just to ask yourself: is
               | what I'm posting part of a _curious_ conversation? That
               | 's the intended spirit of the site. It's possible to tell
               | if you (I don't mean you personally, I mean all of us)
               | are functioning in the range of curiosity and to refrain
               | from posting if you aren't.
               | 
               | It _is_ true that the HN guidelines bring a bit of
               | blandness to discourse because they eliminate the rough-
               | and-tumble debate that can work well in much smaller
               | groups of close peers. But that 's because that kind of
               | debate is impossible in a large public forum like HN--it
               | just degenerates immediately into dumb brawls. I've
               | written about this quite a bit if you or anyone wants to
               | read about that:
               | 
               | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
               | que... (I like the rugby analogy)
               | 
               | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
               | que...
        
               | boringuser2 wrote:
               | I think your argument is reasonable from a logical
               | perspective, and I would generally make a similar
               | argument as I would find the template quite persuasive.
               | 
               | However, I, again, feel you're improperly pushing shapes
               | into the shape-board again. Of course, understanding
               | cognitive bias is a fantastic tool to improve human
               | behavior from an engineering perspective, and your
               | argumentum ad numerum is sound.
               | 
               | That being said, you're focusing too much on what my
               | emotional motivation might be rather than looking at the
               | system - do you really think there _isn 't_ an element of
               | that dynamic I outlined in an interaction like this? Of
               | course there is.
               | 
               | Anyhow, you know, I don't have the terminology in my
               | back-pocket, but there's definitely a large blind-spot
               | when someone is ignoring the spirit of intellectual
               | curiosity in a positive light rather than a negative one.
               | 
               | In this case, don't you think a tool like mild negative
               | social feedback might be a useful mechanism? Of course,
               | there's a limit, and if such a person were incapable of
               | further insight, they'd probably not be very useful
               | conversants. That's obviously not happening here.
               | 
               | One final thing is relevant here - you just hit on a
               | pretty important point. There is a grit to a certain type
               | of discourse that is actually superior to this discourse,
               | I'd happily accept that point. Why not just transfer the
               | burden of moderation to that point, rather than what you
               | perceive to be the outset? Surely, you'll greatly reduce
               | your number of false positives.
               | 
               | I provide negative social feedback sometimes because I
               | feel it's appropriate. In the future, I probably won't.
               | That being said, it's obvious that I've never sparked a
               | thoughtless brawl, so the tolerance is at least
               | inappropriately adjusted sufficiently to that extent.
        
         | killingtime74 wrote:
         | Who is Simon Willison? Is he big in AI?
        
           | swyx wrote:
           | formerly cocreator of Django, now Datasette, but pretty much
           | the top writer/hacker on HN making AI topics accessible to
           | engineers https://hn.algolia.com/?dateRange=pastYear&page=0&p
           | refix=tru...
        
             | killingtime74 wrote:
             | Oh wow, nice! Big fan of his work
        
         | ilaksh wrote:
         | The thing is the relevant context often depends on what it's
         | trying to do. You can give it a lot of context in 16k but if
         | there are too many different types of things then I think it
         | will be confused or at least have less capacity for the actual
         | selected task.
         | 
         | So what I am thinking is that some functions might just be like
         | gateways into a second menu level. So instead of just edit_file
         | with the filename and new source, maybe only
         | select_files_for_edit is available at the top level. In that
         | case I can ensure it doesn't try to overwrite an existing file
         | without important stuff that was already in there, by providing
         | the requested files existing contents along with the function
         | allowing the file edit.
        
           | naiv wrote:
           | I think big context only makes sense for document analysis.
           | 
           | For programming you want to keep it slim. Just like you
           | should keep your controllers and classes slim.
           | 
           | Also people with 32k access report very very long response
           | times of up to multiple minutes which is not feasible if you
           | only want a smaller change or analysis.
        
           | throwuwu wrote:
           | Not sure that's true. I haven't completely filled the context
           | with examples but I do provide 8 or so exchanges between user
           | and assistant along with a menu of available commands and it
           | seems to be able to generalize from that very well. No
           | hallucinations either. Good idea about sub menus though, I'll
           | have to use that.
        
         | jonplackett wrote:
         | It was already quite easy to get GPT-4 to output json. You just
         | append 'reply in json with this format' and it does a really
         | good job.
         | 
         | GPT-3.5 was very haphazard though and needs extensive
         | babysitting and reminding, so if this makes gpt3 better then
         | it's useful - it does have an annoying disclaimer though that
         | 'it may not reply with valid json' so we'll still have to do
         | some sense checks into he output.
         | 
         | I have been using this to make a few 'choose your own
         | adventure' type games and I can see there's a TONNE of
         | potential useful things.
        
           | throwuwu wrote:
           | Just end your request with
           | 
           | '''json
           | 
           | Or provide a few examples of user request and then agent
           | response in json. Or both.
        
             | clbrmbr wrote:
             | Does the ```json trick work with the chat models? Or only
             | the earlier completion models?
        
               | throwuwu wrote:
               | Works with chat. They're still text completion models
               | under all that rlhf
        
           | reallymental wrote:
           | Is there any publicly available resource replicate your work?
           | I would love to just find the right kind of "incantation" for
           | the gpt-3.5-t or gpt-4 to output a meaningful story arc etc.
           | 
           | Any examples of your work would be greatly helpful as well!
        
             | devbent wrote:
             | I have an open source project doing exactly this at
             | https://www.generativestorytelling.ai/ GitHub link is on
             | the main page!
        
             | SamPatt wrote:
             | I'm not the person you're asking, but I built a site that
             | allows you to generate fiction if you have an OpenAI API
             | key. You can see the prompts sent in console, and it's all
             | open source:
             | 
             | https://havewords.ai/
        
           | seizethecheese wrote:
           | In a production system, you don't need easy to do most of the
           | time, you need easy without fail.
        
             | pnpnp wrote:
             | Ok, just playing devil's advocate here. How many FAANG
             | companies have you seen have an outage this year? What's
             | their budget?
             | 
             | I think a better way to reply to the author would have been
             | "how often does it fail"?
             | 
             | Every system will have outages, it's just a matter of how
             | much money you can throw at the problem to reduce them.
        
               | jrockway wrote:
               | If 99.995% correct looks bad to users, wait until they
               | see 37%.
        
           | ignite wrote:
           | > You just append 'reply in json with this format' and it
           | does a really good job.
           | 
           | It does an ok job. Except when it doesn't. Definitely misses
           | a lot of the time, sometimes on prompts that succeeded on
           | previous runs.
        
             | bel423 wrote:
             | It literally does it everytime perfectly. I remember I put
             | together an entire system that would validate the JSON
             | against a zod schema and use reflection to fix it and it
             | literally never gets triggered because GPT3.5-turbo always
             | does it right the first time.
        
               | thomasfromcdnjs wrote:
               | Are you saying that it return only JSON before? I'm with
               | the other commenters it was wildly variable and always at
               | least said "Here is your response" which doesn't parse
               | well.
        
               | travisjungroth wrote:
               | If you want a parsable response, have it wrap that with
               | ```. Include an example request/response in your history.
               | Treat any message you can't parse as an error message.
               | 
               | This works well because it has a place to put any "keep
               | in mind" noise. You can actually include that in your
               | example.
        
               | worik wrote:
               | > It literally does it everytime perfectly. I remember I
               | put together an entire system that would validate the
               | JSON against a zod schema and use reflection to fix it
               | and it literally never gets triggered because
               | GPT3.5-turbo always does it right the first time.
               | 
               | Danger! There be assumptions!!
               | 
               | gpt-? is a moving target and in rapid development. What
               | it does Tuesday, which it did not do on Monday, it may
               | well not do on Wednesday
               | 
               | If there is a documented method to guarantee it, it will
               | work that way (modulo OpenAI bugs - and now Microsoft is
               | involved....)
               | 
               | What we had before, what you are talking of, was observed
               | behaviour. An assumption that what we observed in the
               | past will continue in the future is not something to
               | build a business on
        
               | travisjungroth wrote:
               | ChatGPT moves fast. The API version doesn't seem to
               | change except with the model and documented API changes.
        
               | lmeyerov wrote:
               | Yeah no
        
               | whateveracct wrote:
               | No it doesn't lol. I've seen it just randomly not use a
               | comma after one array element, for example.
        
               | LanceJones wrote:
               | Yep. Incorrect trailing commas ad nauseum for me.
        
             | [deleted]
        
           | sheepscreek wrote:
           | The solution that worked great for me - do not use JSON for
           | GPT to agent communication. Use comma separated key=value, or
           | something to that effect.
           | 
           | Then have another pure code layer to parse that into
           | structured JSON.
           | 
           | I think it's the JSON syntax (with curly braces) that does it
           | in. So YAML or TOML might work just as well, but I haven't
           | tried that.
        
             | bombela wrote:
             | It's harder to form a tree with key value. I also tried the
             | relational route. But it would always messup the
             | cardinality (one person should have 0 or n friends, but a
             | person has a single birth date).
        
               | sheepscreek wrote:
               | You could flatten it using namespaced keys. Eg.
               | {           parent1: { child1: value }         }
               | 
               | Becomes one of the following:
               | parent1/child1=value         parent1_child1=value
               | parent1.child1=value
               | 
               | ..you get the idea.
        
               | rubyskills wrote:
               | It's also harder to stream JSON? Maybe I'm overthinking
               | this.
        
             | jacobsimon wrote:
             | Coincidentally, I just published this JS library[1] over
             | the weekend that helps prompt LLMs to return typed JSON
             | data and validates it for you. Would love feedback on it if
             | this is something people here are interested in. Haven't
             | played around with the new API yet but I think this is
             | super exciting stuff!
             | 
             | [1] https://github.com/jacobsimon/prompting
        
               | golergka wrote:
               | Looks promising! Do you do retries when returned json is
               | invalid? Personally, I used io-ts for parsing, and GPT
               | seems to be able to correct itself easily when confronted
               | with a well-formed error message.
        
               | jacobsimon wrote:
               | Great idea, I was going to add basic retries but didn't
               | think to include the error.
               | 
               | Any other features you'd expect in a prompt builder like
               | this? I'm tempted to add lots of other utility methods
               | like classify(), summarize(), language(), etc
        
           | bradly wrote:
           | I could not get GPT-4 to reliably not give some sort of text
           | response, even if was just a simple "Sure" followed by the
           | JSON.
        
             | avereveard wrote:
             | Pass in an agent message with "Sure here is the answer in
             | json format:" after the user message. Gpt will think it has
             | already done the preamble and the rest of the message will
             | start right with the json.
        
             | rytill wrote:
             | Did you try using the API and providing a very clear system
             | message followed by several examples that were pure JSON?
        
               | bradly wrote:
               | Yep. I even gave it a JSON schema file to use. It just
               | wouldn't stop added extra verbage.
        
               | taylorfinley wrote:
               | I just use a regex to select everything between the first
               | and last curly bracket, reliable fixes the "sure, here's
               | your object" problem.
        
               | NicoJuicy wrote:
               | Say it's a json API and may only reply with valid json
               | without explanation.
        
               | bradly wrote:
               | Lol yes of course I tried that.
        
               | dror wrote:
               | I've had good luck with both:
               | 
               | https://github.com/drorm/gish/blob/main/tasks/coding.txt
               | 
               | and
               | 
               | https://github.com/drorm/gish/blob/main/tasks/webapp.txt
               | 
               | With the second one, I reliably generated half a dozen
               | apps with one command.
               | 
               | Not to say that it won't fail sometimes.
        
               | NicoJuicy wrote:
               | Combine both ? :)
        
           | cwxm wrote:
           | even with gpt 4, it hallucinates enough that it's not
           | reliable, forgetting to open/close brackets and quotes. This
           | sounds like it'd be a big improvement.
        
             | ztratar wrote:
             | Nah, this was solved by most teams a while ago.
        
               | bel423 wrote:
               | I feel like I'm taking crazy pills with the amount of
               | people saying this is game changing.
               | 
               | Did they not even try asking gpt to format the output as
               | json?
        
               | worik wrote:
               | > I feel like I'm taking crazy pills....try asking gpt to
               | format the output as json
               | 
               | You are taking crazey pills. Stop
               | 
               | gpt-? is unreliable! That is not a bug in it, it is the
               | nature of the beast.
               | 
               | It is not an expert at anything except natural language,
               | and even then it is an idiot savant
        
             | jonplackett wrote:
             | Not that it matters now but just doing something like this
             | works 99% of the time or more with 4 and 90% with 3.5.
             | 
             | It is VERY IMPORTANT that you respond in valid JSON ONLY.
             | Nothing before or after. Make sure to escape all strings.
             | Use this format:
             | 
             | {"some_variable": [describe the variable purpose]}
        
               | 8organicbits wrote:
               | Wouldn't you use traditional software to validate the
               | JSON, then ask chatgpt to try again if it wasn't right?
        
               | girvo wrote:
               | In my experience, telling it "no thats wrong, try again"
               | just gets it to be wrong in a new different way, or
               | restate the same wrong answer slightly differently. I've
               | had to explicitly guide it to correct answers or formats
               | at times.
        
               | lpfnrobin wrote:
               | [flagged]
        
               | cjbprime wrote:
               | Try different phrasing, like "Did your answer follow all
               | of the criteria?".
        
               | SamPatt wrote:
               | 99% of the time is still super frustrating when it fails,
               | if you're using it in a consumer facing app. You have to
               | clean up the output to avoid getting an error. If it goes
               | from 99% to 100% JSON that is a big deal for me, much
               | simpler.
        
               | jonplackett wrote:
               | Except it says in the small print to expect invalid JSON
               | occasionally, so you have to write your error handling
               | code either way
        
               | davepeck wrote:
               | Yup. Is there a good/forgiving "drunken JSON parser"
               | library that people like to use? Feels like it would be a
               | useful (and separable) piece?
        
               | golol wrote:
               | Honestly, I suspect asking GPT-4 to fix your JSON (in a
               | new chat) is a good drunken JSON parser. We are only
               | scraping the surface of what's possible with LLMs. If
               | Token generation was free and instant we could come up
               | with a giant schema of interacting model calls that
               | generates 10 suggestions, iterates over them, ranks them
               | and picks the best one, as silly as it sounds.
        
               | andai wrote:
               | That's hilarious... if parsing GPT's JSON fails, keep
               | asking GPT to fix it until it parses!
        
               | golol wrote:
               | It shouldn't be surprising though. If a human makes an
               | error parsing JSON, what do you do? You make them look
               | over it again. Unless their intelligence is the
               | bottleneck they might just be able to fix it.
        
               | golergka wrote:
               | It works. Just be sure to build a good error message.
        
               | hhh wrote:
               | I already do this today to create domain-specific
               | knowledge focused prompts and then have them iterate back
               | and forth and a 'moderator' that chooses what goes in and
               | what doesn't.
        
               | golergka wrote:
               | If you're building an app based on LLMs that expects
               | higher than 99% correctness from it, you are bound to
               | fail. Negative scenarios workarounds and retries are
               | mandatory.
        
             | whateveracct wrote:
             | It forgets commas too
        
           | muzani wrote:
           | It's fine, but the article makes some good points why - less
           | cognitive load for GPT and less tokens. I think the
           | transistor to logic gate analogy makes sense. You can build
           | the thing perfectly with transistors, but just use the logic
           | gate lol.
        
           | sethd wrote:
           | I like to define a JSON schema (https://json-schema.org/) and
           | prompt GPT-4 to output JSON based on that schema.
           | 
           | This lets me specify general requirements (not just JSON
           | structure) inline with the schema and in a very detailed and
           | structured manor.
        
         | SimianLogic wrote:
         | I agree with this. We've already gotten pretty good at json
         | coercion, but this seems like it goes one step further by
         | bundling decision making in to the model instead of junking up
         | your prompt or requiring some kind of eval on a single json
         | response.
         | 
         | It should also be much easier to cache these functions. If you
         | send the same set of functions on every API hit, OpenAI should
         | be able to cache that more intelligently than if everything was
         | one big text prompt.
        
         | minimaxir wrote:
         | "Trivial" is misleading. From OpenAI's docs and demos, the full
         | ReAct workflow is an order of magnitude more difficult than
         | typical ChatGPT API usage with a new set of constaints (e.g.
         | schema definitions)
         | 
         | Even OpenAI's notebook demo has error handling workflows which
         | was actually necessary since ChatGPT returned incorrect
         | formatted output.
        
           | cjonas wrote:
           | Maybe trivial isn't the right word, but it's still very
           | straight-forward to get something basic, yet really
           | powerful...
           | 
           | ReAct Setup Prompt (goal + available actions) -> Agent
           | "ReAction" -> Parse & Execute Action -> Send Action Response
           | (success or error) -> Agent "ReAction" -> repeat
           | 
           | As long as each action has proper validation and returns
           | meaningful error messages, you don't need to even change the
           | control flow. The agent will typically understand what went
           | wrong, and attempt to correct it in the next "ReAction".
           | 
           | I've been refactoring some agents to use "functions" and so
           | far it seems to be a HUGE improvement in reliability vs the
           | "Return JSON matching this format" approach. Most impactful
           | is that fact that "3.5-turbo" will now reliability return
           | JSON (before you'd be forced to use GPT-4 for an ReAct style
           | agent of modest complexity).
           | 
           | My agents also seem to be better at following other
           | instructions now that the noise of the response format is
           | gone (of course it's still there, but in a way it has been
           | specifically trained on). This could also just be a result of
           | the improvements to the system prompt though.
        
             | [deleted]
        
             | devbent wrote:
             | For 3.5, I found it easiest to specify a simple, but
             | parsable, format for responses and then convert that to
             | JSON myself.
             | 
             | I'll have to see if the new JSON schema support is easier
             | than what I already have in place.
        
         | lbeurerkellner wrote:
         | It's interesting to think about this form of computation (LLM +
         | function call) in terms of circuitry. It is still unclear to me
         | however, if the sequential form of reasoning imposed by a
         | sequence of chat messages is the right model here. LLM decoding
         | and also more high-level "reasoning algorithms" like tree of
         | thought are not that linear.
         | 
         | Ever since we started working on LMQL, the overarching vision
         | all along was to get to a form of language model programming,
         | where LLM calls are just the smallest primitive of the "text
         | computer" you are running on. It will be interesting to see
         | what kind of patterns emerge, now that the smallest primitive
         | becomes more robust and reliable, at least in terms of the
         | interface.
        
         | freezed88 wrote:
         | 100%, if the API itself can choose to call a function or an
         | LLM, then it's way easier to build any agent loop without
         | extensive prompt engineering + worrying about errors.
         | 
         | Tweeted about it here as well:
         | https://twitter.com/jerryjliu0/status/1668994580396621827?s=...
        
           | bel423 wrote:
           | You still have to worry about errors. You will probably have
           | to add an error handler function that it can call out to.
           | Otherwise the LLM will hallucinate a valid output regardless
           | of the input. You want it to be able to throw an error and
           | say I could produce the output given this format.
        
         | moneywoes wrote:
         | Wow your brand is huge. Crazy growth. i wonder how much these
         | subtle mentions on forums help
        
           | swyx wrote:
           | i mean hopefully its relevant content to the discussion, i
           | hope enough pple know me here by now that i fully participate
           | in The Discourse rather than just being here to cynically
           | plug my stuff. i had a 1.5 hr convo with simon willison and
           | other well known AI tinkerers on this exact thing, and so I
           | shared it, making the most out of their time that they chose
           | to share with me.
        
           | TeMPOraL wrote:
           | They're the only one commenter on HN I noticed keeps writing
           | "smol" instead of "small", and is associated with projects
           | with "smol" in their name. Surely I'm not the only one who
           | missed it being a meme around 2015 or sth., and finds this
           | word/use jarring - and therefore very attention-grabbing?
           | Wonder how much that helps with marketing.
           | 
           | This is meant with no negative intentions. It's just that
           | 'swyx was, in my mind, "that HN-er that does AI and keeps
           | saying 'smol'" for far longer than I was aware of
           | latent.space articles/podcasts.
        
             | swyx wrote:
             | and fun fact i used to work at Temporal too heheh.
        
             | memefrog wrote:
             | Personally, I associate "smol" with "doggo" and "chonker"
             | and other childish redditspeak.
        
         | ftxbro wrote:
         | > "you can now trivially make GPT4 decide whether to call
         | itself again, or to proceed to the next stage."
         | 
         | Does this mean the GPT-4 API is now publicly available, or is
         | there still a waitlist? If there's a waitlist and you literally
         | are not allowed to use it no matter how much you are willing to
         | pay then it seems like it's hard to call that trivial.
        
           | bayesianbot wrote:
           | "With these updates, we'll be inviting many more people from
           | the waitlist to try GPT-4 over the coming weeks, with the
           | intent to remove the waitlist entirely with this model. Thank
           | you to everyone who has been patiently waiting, we are
           | excited to see what you build with GPT-4!"
           | 
           | https://openai.com/blog/function-calling-and-other-api-
           | updat...
        
           | Tostino wrote:
           | Not GP, but it's still the latter...i've been (im)patiently
           | waiting.
           | 
           | From their blog post the other day: With these updates, we'll
           | be inviting many more people from the waitlist to try GPT-4
           | over the coming weeks, with the intent to remove the waitlist
           | entirely with this model. Thank you to everyone who has been
           | patiently waiting, we are excited to see what you build with
           | GPT-4!
        
             | londons_explore wrote:
             | If you put contact info in your HN profile - especially an
             | email address that matches one you use to login to openai,
             | someone will probably give you access...
             | 
             | Anyone with access can share it with any other user via the
             | 'invite to organisation' feature. Obviously that allows the
             | invited person do requests billed to the inviter, but since
             | most experiments are only a few cents that doesn't really
             | matter much in practice.
        
               | Tostino wrote:
               | Good to know, but I've racked up a decent bill for just
               | my GPT 3.5 use. I can get by with experiments using my
               | ChatGPT Plus subscription, but I really need my own API
               | access to start using it for anything serious.
        
         | majormajor wrote:
         | GPT-4 was already a massive improvement on 3.5 in terms of
         | replying consistently in a certain JSON structure - I often
         | don't even need to give examples, just a sentence describing
         | the format.
         | 
         | It's great to see they're making it even better, but where I'm
         | currently hitting the limit still in GPT-4 for "shelling out"
         | is about it being truly "creative" or "introspective" about "do
         | I need to ask for clarifications" or "can I find a truly novel
         | away around this task" type of things vs "here's a possible but
         | half-baked sequence I'm going to follow".
        
           | fumar wrote:
           | It is "good enough". Where I struggle is maintaining its
           | memory through a longer request where multiple iterations
           | fail or succeed and then all of a sudden its memory is
           | exceeded and starts fresh. I wish I could store "learnings"
           | that it could revisit.
        
             | ehsanu1 wrote:
             | Sounds like you want something like tree of thoughts:
             | https://arxiv.org/abs/2305.10601
        
               | jimmySixDOF wrote:
               | Interestingly the paper's repo starts off :
               | 
               | Blah Blah "...is NOT the correct implementation to
               | replicate paper results. In fact, people have reported
               | that his code cannot properly run, and is probably
               | automatically generated by ChatGPT, and kyegomez has done
               | so for other popular ML methods, while intentionally
               | refusing to link to official implementations for his own
               | interests"
               | 
               | Love a good GitHub Identity Theft Star farming ML story
               | 
               | But this method could have potential for a chain of
               | function
        
         | babyshake wrote:
         | What would be an example where there needs to be an arbitrary
         | level of recursive ability for GPT4 to call itself?
        
           | swyx wrote:
           | writing code of higher complexity (we know from CICERO that
           | longer time spent on inference is worth orders of magnitude
           | more than the equivalent in training when it comes to
           | improving end performance), or doing real world tasks with
           | unknown fractal depth (aka yak shave)
        
       | iamflimflam1 wrote:
       | It's pretty interesting how the work they've been doing on
       | plugins has fed into this.
       | 
       | I suspect that they've managed to get a lot of good training data
       | by calling the APIs provided by plugins and detecting when it's
       | gone wrong from bad request responses.
        
       | coding123 wrote:
       | We're not far from writing a bunch of stubs, query GPT at startup
       | to resolve the business logic. I guess we're going to need a new
       | JAX-RS soon.
        
       | runeb wrote:
       | The way openai implemented this is really clever, beyond how neat
       | the plugin architecture is, as it lets them peek one layer inside
       | your internal API surface and can infer what you intend to do
       | with the LLM output. Collecting some good data here.
        
         | swyx wrote:
         | huh, i never thought of it that way. i thought openai pinky
         | swears not to train on our data tho
        
       | danShumway wrote:
       | I'm concerned that OpenAI's example documentation suggests using
       | this to A) construct SQL queries and B) summarize emails, but
       | that their example code doesn't include clear hooks for human
       | validation before actions are called.
       | 
       | For a recipe builder it's not so big a deal, but I really worry
       | how eager people are to remove human review from these steps. It
       | gets rid of a very important mechanism for reducing the risks of
       | prompt injection.
       | 
       | The top comment here suggests wiring this up to allow GPT-4 to
       | recursively call itself. Meanwhile, some of the best advice I've
       | seen from security professionals on secure LLM app development is
       | to whenever possible completely isolate queries from each other
       | to reduce the potential damage that a compromised agent can do
       | before its "memory" is wiped.
       | 
       | There are definitely ways to use this safely, and there are
       | definitely some pretty powerful apps you could build on top of
       | this without much risk. LLMs as a transformation layer for
       | trusted input is a good use-case. But are devs going to stick
       | with that? Is it going to be used safely? Do devs understand any
       | of the risks or how to mitigate them in the first place?
       | 
       | 3rd-party plugins on ChatGPT have repeatedly been vulnerable in
       | the real world, I'm worried about what mistakes developers are
       | going to make now that they're actively encouraged to treat GPT
       | as even more of a low-level data layer. Especially since OpenAI's
       | documentation on how to build secure apps is mostly pretty bad,
       | and they don't seem to be spending much time or effort educating
       | developers/partners on how to approach LLM security.
        
         | irthomasthomas wrote:
         | I don't understand why they have done this? Like, how did the
         | conversations go when it was pointed out to them what a pretty
         | darn bad idea it was to recommend connecting chatgpt directly
         | to a SQL database?
         | 
         | I know we are supposed to assume incompetence over malice, but
         | no one is that incompetent. They must have had the
         | conversations, and chose to do it anyway.
        
           | sebzim4500 wrote:
           | Why is this unreasonable to you? I can imagine using this,
           | just run it with read access and check the sql if the results
           | are interesting.
        
             | irthomasthomas wrote:
             | Even read only. You are giving access to your data to a
             | black box API.
        
               | sebzim4500 wrote:
               | If it's on Azure anyway I don't see the big deal,
               | especially if you are an enterprise and so buying it via
               | azure instead of directly.
        
           | blitzar wrote:
           | Perhaps they plan on having ChatGPT make a quick copy of your
           | database, for your convenience of course.
        
         | abhibeckert wrote:
         | In my opinion the only way to use it safely is to ensure your
         | AI only has access to data that the end user already has access
         | to.
         | 
         | At that point, prompt injection is no-longer an issue - because
         | the AI doesn't need to hide anything.
         | 
         | Giving GPT access to your entire database, but telling it not
         | to reveal certain bits, is never going to work. There will
         | always be side channel vulnerabilities in those systems.
        
           | kristiandupont wrote:
           | >At that point, prompt injection is no-longer an issue [...]
           | 
           | As far as input goes, yes. But I am more worried about agents
           | that can take actions that affect the outside world, like
           | sending emails on your behalf.
        
           | jacobr1 wrote:
           | > your AI only has access to data that the end user already
           | has access to.
           | 
           | That doesn't work for the same reason you mention with a DB
           | ... any data source is vulnerable to indirect injection
           | attacks. If you open the door to ANY data source this a
           | factor, including ones under the sole "control" of the user.
        
           | danShumway wrote:
           | > e.g. define a function called extract_data(name: string,
           | birthday: string), or sql_query(query: string)
           | 
           | This section in OpenAI's product announcement really
           | irritates me because it's so obvious that the model should
           | have access to a subset of API calls that themselves fetch
           | the data, as opposed to giving the model raw access to SQL.
           | You could have the same capabilities while eliminating a huge
           | amount of risk. And OpenAI just sticks this right in the
           | announcement, they're encouraging it.
           | 
           | When I'm building a completely isolated backend with just
           | regular code, I still usually put a data access layer in
           | front of the database in most cases. I still don't want my
           | REST endpoints directly building SQL queries or directly
           | accessing the database, and that's without an LLM in the loop
           | at all. It's just safer.
           | 
           | It's the same idea as using `innerHTML`; in general it's
           | better when possible to have those kinds of calls extremely
           | isolated and to go through functions that constrain what can
           | go wrong. But no, OpenAI just straight up telling developers
           | to do the wrong things and to give GPT unrestricted database
           | access.
        
             | jmull wrote:
             | SQL doesn't necessarily have to mean full database access.
             | 
             | I known it's pretty common to have apps connect to a
             | database with a db user with full access to do anything,
             | but that's definitely not the only way.
             | 
             | If you're interested in being safer, it's worth learning
             | the security features built in to your database.
        
               | danShumway wrote:
               | > If you're interested in being safer, it's worth
               | learning the security features built in to your database.
               | 
               | The problem isn't that there's no way to be safe, the
               | problem is that OpenAI's documentation does not do
               | anything to discourage developers from implementing this
               | in the most dangerous way possible. Like you suggest, the
               | most common way this will be implemented is via a db user
               | with full access to do anything.
               | 
               | Developers would be far more likely to implement this
               | safely if they were discouraged from using direct SQL
               | queries. Developers who know how to safely add SQL
               | queries will still know how to do that -- but developers
               | who are copying and pasting code or thinking naively
               | "can't I just feed my schema into GPT" should be pushed
               | towards an implementation that's harder to mess up.
        
               | jmull wrote:
               | It's hard for me to believe openai's documentation will
               | have any effect on developers who write or copy-and-paste
               | data access code without regard to security, no matter
               | what it says.
               | 
               | If you provide an API or other external access to app
               | data and the app data contains anything not everyone
               | should be able to access freely then your API has to
               | implement some kind of access control. It really doesn't
               | matter if your API is SQL-based, REST-based, or whatever.
               | 
               | A SQL-based API isn't inherently less secure than a non-
               | SQL-based one if you implement access control, and a non-
               | SQL-based API isn't inherently more secure than a SQL-
               | based one if you don't implement access control. The SQL-
               | ness of an API doesn't change the security picture.
        
               | danShumway wrote:
               | > If you provide an API or other external access to app
               | data and the app data contains anything not everyone
               | should be able to access freely then your API has to
               | implement some kind of access control. It really doesn't
               | matter if your API is SQL-based, REST-based, or whatever.
               | 
               | I don't think that's the way developers are going to
               | interact with GPT at all, I don't think they're looking
               | at this as if it's external access. OpenAI's
               | documentation makes it feel like a system library or
               | dependency, even though it's clearly not.
               | 
               | I'll go out on a limb, I suspect a pretty sizable chunk
               | (if not an outright majority) of the devs who try to
               | build on this will not be thinking about the fact that
               | they need access controls at all.
               | 
               | > A SQL-based API isn't inherently less secure than a
               | non-SQL-based one if you implement access control, and a
               | non-SQL-based API isn't inherently more secure than a
               | SQL-based one if you don't implement access control. The
               | SQL-ness of an API doesn't change the security picture.
               | 
               | I'm not sure I agree with this either. If I see a dev
               | exposing direct query access to a database, my reaction
               | is going to be very dependent on whether or not I think
               | they're an experienced programmer already. If I know them
               | enough to trust them, fine. Otherwise, my assumption is
               | that they're probably doing something dangerous. I think
               | the access controls that are built into SQL are a lot
               | easier to foot-gun, I generally advise devs to build
               | wrappers because I think it's generally harder to mess
               | them up. Opinion me :shrug:
               | 
               | Regardless, I do think the way OpenAI talks about this
               | does matter, I do think their documentation will
               | influence how developers use the product, so I think if
               | they're going to talk about SQL they should in-code be
               | showing examples of how to implement those access
               | controls. "We're just providing the API, if developers
               | mess it up its their fault" -- I don't know, good APIs
               | and good documentation should try to when possible
               | provide a "pit of success[0]" for naive developers. In
               | particular I think that matters when talking about a
               | market segment that is getting a lot of naive VC money
               | thrown at it sometimes without a lot of diligence, and
               | where those security risks may end up impacting regular
               | people.
               | 
               | [0]: https://blog.codinghorror.com/falling-into-the-pit-
               | of-succes...
        
             | BoorishBears wrote:
             | You don't need to directly run the query it returns, you
             | can use that query as a sub-query on a known safe set of
             | data and let it fail if someone manages to prompt inject
             | their way into looking at other tables/columns.
             | 
             | That way you can support natural language to query without
             | sending dozens of functions (which will eat up the context
             | window)
        
               | danShumway wrote:
               | You can do that (I wouldn't advise it, there are still
               | problems that are better solved by building explicit
               | functions; but you can use subqueries and it would be
               | safer) -- but most developers won't. They'll run the
               | query directly. Most developers also will not execute it
               | as a readonly query, they'll give the LLM write access to
               | the database.
               | 
               | If OpenAI doesn't know that, then I don't know what to
               | say, they haven't spent enough time writing documentation
               | for general users.
        
               | BoorishBears wrote:
               | You can't advise for or against it without a well defined
               | problem: for some cases explicit functions won't even be
               | an option.
               | 
               | Defining basic CRUD functions for a few basic entities
               | will a ton of tokens in schema definitions, and still
               | suffers from injection if you want to support querying on
               | data that wasn't well defined a-priori, which is a
               | problem I've worked on.
               | 
               | Overall if this was one of their example projects I'd be
               | disappointed, but it was a snippet in a release note. So
               | far their _actual_ example projects have done a fair job
               | showing where guardrails in production systems are
               | needed, I wouldn 't over-index on this.
        
               | danShumway wrote:
               | > You can't advise for or against it without a well
               | defined problem: for some cases explicit functions won't
               | even be an option.
               | 
               | On average I think I can. I mean, I can't know without
               | the exact problem specifications whether or not a
               | developer should use `innerHTML`/`eval`. But I can offer
               | general advice against it, even though both can be used
               | securely. I feel pretty safe saying that exposing SQL
               | access directly in an API will _usually_ lead to more
               | fragile infrastructure. There are plenty of exceptions of
               | course, but there are exceptions to pretty much all
               | programming advice. I don 't think it's good for it to be
               | one of the first examples they bring up for how to use
               | the API.
               | 
               | ----
               | 
               | > Overall if this was one of their example projects I'd
               | be disappointed
               | 
               | I have similar complaints about their example code. They
               | include the comment:
               | 
               | > # Note: the JSON response from the model may not be
               | valid JSON
               | 
               | But they don't actually do schema validation here or
               | check anything. Their example project isn't fit to
               | deploy. My thought on this is that if every response for
               | practically every project needs to have schema validation
               | (and I would strongly advise doing schema validation on
               | every response), then the sample code should have schema
               | validation in it. Their example project should be
               | something that could be almost copy-and-pasted.
               | 
               | If that makes the code sample longer, well... that is the
               | minimum complexity to build an app on this. The sample
               | code should reflect that.
               | 
               | > and still suffers from injection if you want to support
               | querying on data that wasn't well defined a-priori
               | 
               | This is a really good point. My response would be that
               | they should be expanding on this as well. I'm really
               | frustrated that OpenAI's documentation provides (imo)
               | basically no really practical/great security advice other
               | than "hey, this problem exists, make sure you deal with
               | it." But it seems to me like they're already falling over
               | on providing good documentation before they even get to
               | the point where they can talk seriously about bigger
               | security decisions.
        
         | sillysaurusx wrote:
         | I was going to say "I look forward to it and think it's
         | hilarious," but then I remembered that most victims will be
         | people learning to code, not companies. It would really suck to
         | suddenly lose your recipe database when you just wanted to
         | figure out how this programming stuff worked.
         | 
         | Some kind of "heads up" tagline is probably a good idea, yeah.
        
           | kristiandupont wrote:
           | I think the victims will mostly be the users of the software.
           | The personal assistant that can handle your calendar and
           | emails and all would be able to do real damage.
        
       | irthomasthomas wrote:
       | It's a shame they couldn't use yaml, instead. I compared them and
       | yaml uses about 20% fewer tokens. However, I can understand
       | accuracy, derived from frequency, being more important than token
       | budget.
        
         | IshKebab wrote:
         | I would imagine JSON is easier for a LLM to understand (and for
         | humans!) because it doesn't rely on indentation and confusing
         | syntax for lists, strings etc.
        
         | nasir wrote:
         | Its a lot more straightforward to use JSON programmatically
         | than YAML.
        
           | TeMPOraL wrote:
           | It really shouldn't be, though. I.e. not unless you're
           | parsing or emitting it ad-hoc, for example by assuming that
           | an expression like:                 "{" + $someKey + ":" +
           | $someValue + "}"
           | 
           | produces a valid JSON. It does - sometimes - and then it's
           | indeed easier to work with. It'll also blow up in your face.
           | Using JSON the right way - via a proper parser and serializer
           | - should be identical to using YAML or any other equivalent
           | format.
        
             | riwsky wrote:
             | Even if the APIs for both were equally simple, modules for
             | manipulating json are way more likely to be available in
             | the stdlib of whatever language you're using.
        
           | golergka wrote:
           | If you are using any kind of type checking instead of blindly
           | trusting generated json it's exactly the same amount of work.
        
         | blamy wrote:
         | JSON can be minified.
        
         | AdrienBrault wrote:
         | I think YAML actually uses more tokens than JSON without
         | indents, especially with deep data. For example "," being a
         | single token makes JSON quite compact.
         | 
         | You can compare JSON and YAML on
         | https://platform.openai.com/tokenizer
        
       | ulrikrasmussen wrote:
       | Has anyone tried throwing their backend Swagger at this and made
       | ChatGPT perform user story tests?
        
       | rank0 wrote:
       | OpenAI integration is going to be a goldmine for criminals in the
       | future.
       | 
       | Everyone and their momma is gonna start passing poorly
       | validated/sanitized client input to shared sessions of a non-
       | deterministic function.
       | 
       | I love the future!
        
         | nextworddev wrote:
         | In the "future"?
        
       | zyang wrote:
       | Is it possible to fine-tune with custom data to output JSON?
        
         | edwin wrote:
         | That's not the current OpenAI recipe. Their expectation is that
         | your custom data will be retrieved via a function/plugin and
         | then be subsequently processed by a chat model.
         | 
         | Only the older completion models (davinci, curie, babbage, ada)
         | are avaialble for fine-tuning.
        
       | jamesmcintyre wrote:
       | In the openai blog post they mention "Convert "Who are my top ten
       | customers this month?" to an internal API call" but I'm assuming
       | they mean gpt will respond with structured json (we define via
       | schema in function prompt) that we can use to more easily
       | programatically make that api call?
       | 
       | I could be confused but I'm interpreting this function calling as
       | "a way to define structured input and selection of function and
       | then structured output" but not the actual ability to send it
       | arbitrary code to execute.
       | 
       | Still amazing, just wanting to see if I'm wrong on this.
        
         | williamcotton wrote:
         | This does not execute code!
        
           | jamesmcintyre wrote:
           | Ok, yea this makes sense. Also for others curious of the flow
           | here's a video walkthrough I just skimmed through:
           | https://www.youtube.com/watch?v=91VVM6MNVlk
        
       | smallerfish wrote:
       | I will experiment with this at the weekend. Once thing I found
       | useful with supplying a json schema in the prompt was that I
       | could supply inline comments and tell it when to leave a field
       | null, etc. I found that much more reliable than describing these
       | nuances elsewhere in the prompt. Presumably I can't do this with
       | functions, but maybe I'll be able to work around it in the prompt
       | (particularly now that I have more room to play with.)
        
       | loughnane wrote:
       | Just this morning I wrote a JSON object. I told GPT to turn it
       | into a schema. I tweaked that and then gave a list of terms for
       | which I wanted GPT to populate the schema accordingly.
       | 
       | It worked pretty well without any functions, but I did feel like
       | I was missing something because I was ready to be explicit and
       | there wasn't any way for me to tell that to GPT.
       | 
       | I look forward to trying this out.
        
       | mritchie712 wrote:
       | Glad we didn't get to far into adopting something like
       | Guardrails. This sort of kills it's main value prop for OpenAI.
       | 
       | https://shreyar.github.io/guardrails/
        
         | Blahah wrote:
         | Luckily it's for LLMs, not openai
        
         | blamy wrote:
         | Guardrails is an awesome project and will continue to be even
         | after this.
        
         | swyx wrote:
         | i mean only at the most superficial level. she has a ton of
         | other validators that arent superceded (eg SQL is validated by
         | branching the database - we discussed on our pod
         | https://www.latent.space/p/guaranteed-quality-and-structure)
        
           | mritchie712 wrote:
           | yeah, listened to the pod (that's how I found out about
           | guardrails!).
           | 
           | fair point, I should have said: "value prop for our use
           | case"... the thing I was most interested in was how well
           | Guardrails structured output.
        
             | swyx wrote:
             | haha excellent. i was quite impressed by her and the vision
             | for guardrails. thanks for listening!
        
       | Kiro wrote:
       | Can I use this to make it reliably output code (say JavaScript)?
       | I haven't managed to do it with just prompt engineering as it
       | will still add explanations, apologies and do other unwanted
       | things like splitting the code into two files as markdown.
        
         | minimaxir wrote:
         | Here's a demo of some system prompt engineering which resulted
         | in better results for the older ChatGPT:
         | https://github.com/minimaxir/simpleaichat/blob/main/examples...
         | 
         | Coincidentially, the new gpt-3.5-turbo-0613 model also has
         | better system prompt guidance: for the demo above and some
         | further prompt tweaking, it's possible to get ChatGPT to output
         | code super reliably.
        
         | williamcotton wrote:
         | Here's an approach to return just JavaScript:
         | 
         | https://github.com/williamcotton/transynthetical-engine
         | 
         | The key is the addition of few-shot exemplars.
        
         | sanxiyn wrote:
         | Not this, but using the token selection restriction approach,
         | you can let LLM produce output that conforms to arbitrary
         | formal grammar completely reliably. JavaScript, Python,
         | whatever.
        
       | Xen9 wrote:
       | Marvin Minsky was so damn far ahead of his time with Society of
       | Mind.
       | 
       | Engineering of cognitively advanced multiagent systems will
       | become the area of research of this century / multiple decades.
       | 
       | GPT-GPT > GPT-API in terms of power.
       | 
       | The space of possible combinations of GPT multiagents goes beyond
       | imagination since even GPT-4 goes so.
       | 
       | Multiagent systems are best modeled with signal theory, graph
       | theory and cognitive science.
       | 
       | Of course "programming" will also play a role, in sense of
       | abstractions and creation of systems of / for thought.
       | 
       | Signal theory will be a significant approach for thinking about
       | embedded agency.
       | 
       | Complex multiagent systems approach us.
        
         | SanderNL wrote:
         | Makes me think of the Freud/Jungian notions of personas in us
         | that are in various degrees semi-autonomously looking out for
         | themselves. The "angry" agent, the "child" agent, so on.
        
       | edwin wrote:
       | For those who want to test out the LLM as API idea, we are
       | building a turnkey prompt to API product. Here's Simon's recipe
       | maker deployed in a minute:
       | https://preview.promptjoy.com/apis/1AgCy9 . Public preview to
       | make and test your own API: https://preview.promptjoy.com
        
         | wonderfuly wrote:
         | I own this domain: prompts.run Do you wanna it?
        
         | yonom wrote:
         | This is cool! Are you using one-shot learning under the hood
         | with a user provided example?
        
           | edwin wrote:
           | BTW: Here's a more performant version (fewer tokens)
           | https://preview.promptjoy.com/apis/jNqCA2 that uses a smaller
           | example but will still generate pretty good results.
        
             | sudb wrote:
             | This is still pretty fast - impressive! Are there any
             | tricks you're doing to speed things up?
        
           | edwin wrote:
           | Thanks. We find few-shot learning to be more effective
           | overall. So we are generating additional examples from the
           | provided example.
        
         | abhpro wrote:
         | This is really cool, I had a similar idea but didn't build it.
         | I was also thinking a user could take these different prompts
         | (I called them tasks) that anyone could create, and then
         | connect them together like a node graph or visual programming
         | interface, with some Chat-GPT middleware that resolves the
         | outputs to inputs.
        
         | edelans wrote:
         | Congrats on the first-time user experience, I could experiment
         | with your API in a few seconds, and the product is sleek!
        
       | darepublic wrote:
       | I have been using gpt4 to translate natural language to JSON
       | already. And on v4 ( not v3) it hasn't returned any malformed
       | JSON iirc
        
         | yonom wrote:
         | - if the only reason you're using v4 over v3.5 is to generate
         | JSON, you can now use this API and downgrade for faster and
         | cheaper API calls. - malicious user input may break your json
         | (by asking GPT to include comments around the JSON, as another
         | user suggested); this may or may not be an issue (e. g. if one
         | user can influence other users' experience)
        
         | nocsi wrote:
         | What if you ask it to include comments in the JSON explaining
         | its choices
        
       | courseofaction wrote:
       | Nice to have an endpoint which takes care of this. I've been
       | doing this manually, it's a fairly simple process:
       | 
       | * Add "Output your response in json format, with the fields 'x',
       | which indicates 'x_explanation', 'z', which indicates
       | 'z_explanation' (...)" etc. GPT-4 does this fairly reliably.
       | 
       | * Validate the response, repeat if malformed.
       | 
       | * Bam, you've got a json.
       | 
       | I wonder if they've implemented this endpoint with validation and
       | carefully crafted prompts on the base model, or if this is
       | specifically fine-tuned.
        
         | 037 wrote:
         | It appears to be fine-tuning:
         | 
         | "These models have been fine-tuned to both detect when a
         | function needs to be called (depending on the user's input) and
         | to respond with JSON that adheres to the function signature."
         | 
         | https://openai.com/blog/function-calling-and-other-api-updat...
        
       | wskish wrote:
       | here is code (with several examples) that takes it a couple steps
       | further by validating the output json and pydantic model and
       | providing feedback to the llm model when it gets either of those
       | wrong:
       | 
       | https://github.com/jiggy-ai/pydantic-chatcompletion/blob/mas...
        
       | andsoitis wrote:
       | having gpt-4 as a dependency for your product or business
       | seems... shortsighted
        
       | l5870uoo9y wrote:
       | This was technically possible before. I think the approach used
       | by many - myself included - is to simply embed results in a
       | markdown code block and the match it with regex pattern. Then you
       | just need to phrase the prompt to generate the desired output.
       | 
       | This is an example of that generating the arguments for the
       | MongoDB's `db.runCommand()` function:
       | https://aihelperbot.com/snippets/cliwx7sr80000jj0finjl46cp
        
       | sublinear wrote:
       | > The process is simple enough that you can let non-technical
       | people build something like this via a no-code interface. No-code
       | tools can leverage this to let their users define "backend"
       | functionality.
       | 
       | Early prototypes of software can use simple prompts like this one
       | to become interactive. Running an LLM every time someone clicks
       | on a button is expensive and slow in production, but _probably
       | still ~10x cheaper to produce than code._
       | 
       | Hah wow... no. Definitely not.
        
       | lasermatts wrote:
       | I thought GPT-4 was doing a pretty good job at outputting JSON
       | (for some of the toy problems I've given it like some of my
       | gardening projects.) Interesting to see this hit the very top of
       | HN
        
       | social_ism wrote:
       | [dead]
        
       | thorum wrote:
       | The JSON schema not counting toward token usage is huge, that
       | will really help reduce costs.
        
         | minimaxir wrote:
         | That is up in the air and needs more testing. Field
         | descriptions, for example, are important but extraneous input
         | that would be tokenized and count in the costs.
         | 
         | At the least for ChatGPT, input token costs were cut by 25% so
         | it evens out.
        
         | stavros wrote:
         | > Under the hood, functions are injected into the system
         | message in a syntax the model has been trained on. This means
         | functions count against the model's context limit and are
         | billed as input tokens. If running into context limits, we
         | suggest limiting the number of functions or the length of
         | documentation you provide for function parameters.
        
         | yonom wrote:
         | I believe functions do count in some way toward the token
         | usage; but it seems to be in a more efficient way than pasting
         | raw JSON schemas into the prompt. Nevertheless, the token usage
         | seems to be far lower than previous alternatives, which is
         | awesome!
        
         | blamy wrote:
         | But it does count toward token usage. And they picked JSON
         | schema which is like 6x more verbose than typescript for
         | defining the shape of json.
        
       | adultSwim wrote:
       | _Running an LLM every time someone clicks on a button is
       | expensive and slow in production, but probably still ~10x cheaper
       | to produce than code._
        
         | edwin wrote:
         | New techniques like semantic caching will help. This is the
         | modern era's version of building a performant social graph.
        
           | daralthus wrote:
           | What's semantic caching?
        
             | edwin wrote:
             | With LLMs, the inputs are highly variable so exact match
             | caching is generally less useful. Semantic caching groups
             | similar inputs and returns relevant results accordingly. So
             | {"dish":"spaghetti bolognese"} and {"dish":"spaghetti with
             | meat sauce"} could return the same cached result.
        
               | m3kw9 wrote:
               | Or store as sentence embedding and calculate the vector
               | distance, but creates many edge cases
        
       | minimaxir wrote:
       | After reading the docs for the new ChatGPT function calling
       | yesterday, it's structured and/or typed data for GPT input or
       | output that's the key feature of these new models. The ReAct flow
       | of tool selection that it provides is secondary.
       | 
       | As this post notes, you don't even need to the full flow of
       | passing a function result back to the model: getting structured
       | data from ChatGPT in itself has a lot of fun and practical use
       | cases. You could coax previous versions of ChatGPT to "output
       | results as JSON" with a system prompt but in practice results are
       | mixed, although even with this finetuned model the docs warn that
       | there still could be parsing errors.
       | 
       | OpenAI's demo for function calling is not a Hello World, to put
       | it mildly: https://github.com/openai/openai-
       | cookbook/blob/main/examples...
        
         | tornato7 wrote:
         | IIRC, there's a way to "force" LLMs to output proper JSON by
         | adding some logic to the top token selection. I.e. in the
         | randomness function (which OpenAI calls temperature) you'd
         | never choose a next token that results in broken JSON. The only
         | reason it wouldn't would be if the output exceeds the token
         | limit. I wonder if OpenAI is doing something like this.
        
           | ManuelKiessling wrote:
           | Note that you don't necessarily need to have the AI output
           | any JSON at all -- simply have it answer when being asked for
           | the value to a specific JSON key, and handle the JSON
           | structure part in your hallucinations-free own code:
           | https://github.com/manuelkiessling/php-ai-tool-bridge
        
             | lyjackal wrote:
             | Would be nice if you could send a back and forth
             | interaction for each key. This approach turns into lots of
             | requests that reapply the entire context and ends up slow.
             | I wish i could just send a Microsoft guidance template
             | program, and process that in a single pass.
        
             | naiv wrote:
             | Thanks for sharing!
        
           | woodrowbarlow wrote:
           | the linked article hypothesizes:
           | 
           | > I assume OpenAI's implementation works conceptually similar
           | to jsonformer, where the token selection algorithm is changed
           | from "choose the token with the highest logit" to "choose the
           | token with the highest logit which is valid for the schema".
        
           | ttul wrote:
           | I think the problem is that tokens are not characters. So
           | even if you had access to a JSON parser state that could tell
           | you whether or not a given character is valid as the next
           | character, I am not sure how you would translate that into
           | tokens to apply the logit biases appropriately. There would
           | be a great deal of computation required at each step to scan
           | the parser state and generate the list of prohibited or
           | allowable tokens.
           | 
           | But if one could pull this off, it would be super cool.
           | Similar to how Microsoft's guidance module uses the
           | logit_bias parameter to force the model to choose between a
           | set of available options.
        
             | yunyu wrote:
             | You simply sample tokens starting with the allowed
             | characters and truncate if needed. It's pretty efficient,
             | there's an implementation here:
             | https://github.com/1rgs/jsonformer
        
           | DougBTX wrote:
           | This is the best implementation I've seen, but only for
           | Hugging Face models: https://github.com/1rgs/jsonformer
        
           | senko wrote:
           | It would seem not, as the official documentation mentions the
           | arguments may be hallucinated or _be a malformed JSON_.
           | 
           | (except if the meaning is the JSON syntax is valid but may
           | not conform to the schema, but they're unclear on that).
        
             | sanxiyn wrote:
             | For various reasons, token selection may be implemented as
             | upweighting/downweighting instead of outright ban of
             | invalid tokens. (Maybe it helps training?) Then the model
             | could generate malformed JSON. I think it is premature to
             | infer from "can generate malformed JSON" that OpenAI is not
             | using token selection restriction.
        
           | sanxiyn wrote:
           | Note that this (token selection restriction) is even
           | available on OpenAI API as logit_bias.
        
             | newhouseb wrote:
             | But only for the whole generation. So if you want to
             | constrain things one token at a time (as you would to force
             | things to follow a grammar) you have to make fresh calls
             | and only request one token which makes things more or less
             | impractical if you want true guarantees. A few months ago I
             | built this anyway to suss out how much more expensive it
             | was [1]
             | 
             | [1] https://github.com/newhouseb/clownfish#so-how-do-i-use-
             | this-...
        
           | have_faith wrote:
           | How would a tweaked temp enforce a non broken output exactly?
        
             | isoprophlex wrote:
             | Not traditional temperature, maybe the parent worded it
             | somewhat obtusely. Anyway, to disambiguate...
             | 
             | I think it works something like this: You let something
             | akin to a json parser run with the output sampler. First
             | token must be either '{' or '['; then if you see [ has the
             | highest probability, you select that. Ignore all other
             | tokens, even those with high probability.
             | 
             | Second token must be ... and so on and so on.
             | 
             | Guarantee for non-broken (or at least parseable) json
        
             | sanxiyn wrote:
             | It's not temperature, but sampling. Output of LLM is
             | probabilistic distribution over tokens. To get concrete
             | tokens, you sample from that distribution. Unfortunately,
             | OpenAI API does not expose the distribution. You only get
             | the sampled tokens.
             | 
             | As an example, on the link JSON schema is defined such that
             | recipe ingredient unit is one of
             | grams/ml/cups/pieces/teaspoons. LLM may output the
             | distribution grams(30%), cups(30%), pounds(40%). Sampling
             | the best token "pounds" would generate an invalid document.
             | Instead, you can use the schema to filter tokens and sample
             | from the filtered distribution, which is grams(50%),
             | cups(50%).
        
         | H8crilA wrote:
         | That SQL example is going to result in a catastrophe somewhere
         | when someone uses it in their project. It is encouraging
         | something very dangerous when allowed to run on untrusted
         | inputs.
        
         | behnamoh wrote:
         | What's the implication of this new change for Microsoft
         | Guidance, LMQL, Langchain, etc.? It looks like much of their
         | functionality (controlling model output) just became obsolete.
         | Am I missing something?
        
           | [deleted]
        
           | lbeurerkellner wrote:
           | If anything this removes a major roadblock for
           | libraries/languages that want to employ LLM calls as a
           | primitive, no? Although, I fear the vendor lock-in
           | intensifies here, also given how restrictive and specific the
           | Chat API.
           | 
           | Either way, as part of the LMQL team, I am actually pretty
           | excited about this, also with respect to what we want to
           | build going forward. This makes language model programming
           | much easier.
        
             | koboll wrote:
             | `Although, I fear the vendor lock-in intensifies here, also
             | given how restrictive and specific the Chat API.`
             | 
             | Eh, would be pretty easy to write a wrapper that takes a
             | functions-like JSON Schema object and interpolates it into
             | a traditional "You MUST return ONLY JSON in the following
             | format:" prompt snippet.
        
             | londons_explore wrote:
             | > Although, I fear the vendor lock-in intensifies here,
             | 
             | The openAI API is super simple - any other vendor is free
             | to copy it, and I'm sure many will.
        
           | neuronexmachina wrote:
           | Langchain added support for `function_call` args yesterday:
           | 
           | * https://github.com/hwchase17/langchain/pull/6099/files
           | 
           | * https://github.com/hwchase17/langchain/issues/6104
           | 
           | IMHO, this should make Langchain much easier and less chaotic
           | to use.
        
             | gawi wrote:
             | It's only been added to the OpenAI interface. Function
             | calling is really useful when used with agents. To include
             | that to agents would require some redesign as the tool
             | instructions should be removed from the prompt templates in
             | favor of function definitions in the API request. The
             | response parsing code would also be affected.
             | 
             | I just hope they won't come up with yet another agent type.
        
               | neuronexmachina wrote:
               | Like this? https://github.com/hwchase17/langchain/blob/ma
               | ster/langchain...
        
               | gawi wrote:
               | LangChain is a perpetual hackathon.
        
         | arbuge wrote:
         | They have something closer to a simple Hello World example
         | here:
         | 
         | https://platform.openai.com/docs/guides/gpt/function-calling
         | 
         | That example needs a bit of work I think. In Step 3, they're
         | not really using the returned function_name; they're just
         | assuming it's the only function that's been defined, which I
         | guess is equivalent for this simple example with just one
         | function but less instructive. In Step 4, I believe they should
         | also have sent the function definition block again a second
         | time since model calls in the API are memory-less and
         | independent. They didn't, although the model appears to guess
         | what's needed anyway in this case.
        
       | m3kw9 wrote:
       | It works pretty good. You define a few "function" and enter a
       | description on what it does, when user prompts, it will
       | understand the prompt and tell you which likely "function" to
       | use, which is just the function name. I feel like this is a new
       | way to program, a sort of fuzzy logic type of programming
        
         | Sai_ wrote:
         | > fuzzy logic
         | 
         | Yes and no. While the choice of which function to call is
         | dependent on an llm, ultimately, you control the function
         | itself whose output is deterministic.
         | 
         | Even today, given an api, people can choose to call or not call
         | based on some factor. We don't call this fuzzy logic. E.g.,
         | people can decide to sell or buy stock through an api based on
         | some internal calculations - doesn't make the system "fuzzy".
        
           | m3kw9 wrote:
           | If you feed that result into another io box you may or may
           | not know if that is the correct answer, which may need some
           | sort of error detection. I think this is going to be majority
           | of the use cases
        
             | Sai_ wrote:
             | Hm, I see what you mean. Afaict, only the decision to call
             | or not call a function is up to the model (fuzzy). Once it
             | decides to call the function, it generates mostly correct
             | JSON based on your schema and returns that to you as is
             | (not very fuzzy).
             | 
             | It'll be interesting to test APIs which accept user inputs.
             | Depending on how ChatGPT populates the JSON, the API could
             | be required to understand/interpret/respond to lots of
             | variability in inputs.
        
               | m3kw9 wrote:
               | Yeah I've tested, you should use the curl example they
               | gave as you can test instantly pasting it into your
               | terminal. The description of the functions is prompt
               | engineering in addition to the original system prompt,
               | need to test the dependency more, it's so new.
        
       | jonplackett wrote:
       | This is useful, but for me at least, GPT-4 is unusable because it
       | sometimes takes 30 seconds + to reply to even basic queries.
        
         | m3kw9 wrote:
         | Also the rate limit is pretty bad if you want to release any
         | type of app
        
           | jiggawatts wrote:
           | More importantly: there's a waiting list.
           | 
           | Also, if you want to use both the ChatGPT web app _and_ the
           | API, you 'll be billed for _both_ separately. They really
           | should be unified and billed under a single account. The
           | difference is literally just whether there 's a "web UI" on
           | top of the API... or not.
        
       | emilsedgh wrote:
       | Building agents that use advanced API's was not really practical
       | until now. Things like Langchain's Structured Agents worked
       | somewhat reliably, but due to the massive token count it was so
       | slow, the experience was _never_ going to be useful.
       | 
       | Due to this, the performance in which our agent processes results
       | has improved 5-6 times and it does actually do a pretty good job
       | of keeping the schema.
       | 
       | One problem that is not resolved yet is that it still
       | hallucinates a lot of attributes. For example we have tool that
       | allows it to create contacts in user's CRM. I ask it to:
       | 
       | "Create contacts for top 3 Barcelona players:.
       | 
       | It creates an structure like this"
       | 
       | 1. Lionel Messi - Email: lionel.messi@barcelona.com - Phone
       | Number: +1234567890 - Tags: Player, Barcelona
       | 
       | 2. Gerard Pique - Email: gerard.pique@barcelona.com - Phone
       | Number: +1234567891 - Tags: Player, Barcelona
       | 
       | 3. Marc-Andre ter Stegen - Email: marc-terstegen@barcelona.com -
       | Phone Number: +1234567892 - Tags: Player, Barcelona
       | 
       | And you can see it hallucinated email addresses and phone
       | numbers.
        
         | 037 wrote:
         | I would never rely on an LLM as a source of such information,
         | just as I wouldn't trust the general knowledge of a human being
         | used as a database. Does your workflow include a step for
         | information search? With the new json features, it should be
         | easy to instruct it to perform a search or directly feed it the
         | right pages to parse.
        
         | pluijzer wrote:
         | ChatGPT can be usefully for many things, but you should really,
         | not use it if you want to retrieve factual data. This might
         | partly be resolved by querying the internet like bing does but
         | purely on the language model side these hallucinations are just
         | an unavoidable part of it.
        
           | Spivak wrote:
           | Yep, it's _always_ _always_ write code  / query / function /
           | whatever you need that you would parse and retrieve the data
           | from an external system.
        
       | arsdragonfly wrote:
       | [dead]
        
       | dang wrote:
       | Recent and related:
       | 
       |  _Function calling and other API updates_ -
       | https://news.ycombinator.com/item?id=36313348 - June 2023 (154
       | comments)
        
         | minimaxir wrote:
         | IMO this isn't a dupe and shouldn't be penalized as a result.
        
           | dang wrote:
           | It's certainly not a dupe. It looks like a follow-up though.
           | No?
        
             | minimaxir wrote:
             | More a very timely but practical demo.
        
               | dang wrote:
               | Ok, thanks!
        
       | EGreg wrote:
       | Actually I'm looking to take GPT-4 output and create file formats
       | like keynote presentations, or pptx. Is that currently possible
       | with some tools?
        
         | yonom wrote:
         | I would recommend creating a simplified JSON schema for the
         | slides (say, presentation is an array of slides, each slide has
         | a title, body, optional image, optional diagram, each diagram
         | is one of pie, table, ... Then use a library to generate the
         | pptx file from the content generated.
        
           | EGreg wrote:
           | Library? What library?
           | 
           | It seems to me that a Transformer should excel at
           | Transforming, say, text into pptx or pdf or HTML with CSS
           | etc.
           | 
           | Why don't they train it on that? So I don't have to sit there
           | with manually written libraries. It can easily transform HTML
           | to XML or text bullet points so why not the other formats?
        
             | yonom wrote:
             | I don't think the name "Transformer" is meant in the sense
             | of "transforming between file formats".
             | 
             | My intuition is that LLMs tend to be good at things human
             | brains are good at (e.g. reasoning), and bad at things
             | human brains are bad at (e.g. math, writing pptx binary
             | files from scratch, ...).
             | 
             | Eventually, we might get LLMs that can open PowerPoint and
             | quickly design the whole presentation using a virtual mouse
             | and keyboard but we're not there yet.
        
               | EGreg wrote:
               | It's just XML They can produce HTML and transform python
               | into php etc.
               | 
               | So why not? It's easy for them no?
        
         | stevenhuang wrote:
         | apparently pandoc also supports pptx
         | 
         | so you can tell GPT4 to output markdown, then use pandoc to
         | convert that markdown to pptx or pdf.
        
           | edwin wrote:
           | Here you go: https://preview.promptjoy.com/apis/m7oCyL
        
       | amolgupta wrote:
       | I pass a kotlin data class and ask chatGPT to return json which
       | can be parsed by that class. Reduces errors with date-time
       | parsing and other formatting issues and takes up lesser tokens
       | than the approach in the article.
        
       ___________________________________________________________________
       (page generated 2023-06-15 23:03 UTC)