[HN Gopher] Structured Outputs in the API
       ___________________________________________________________________
        
       Structured Outputs in the API
        
       Author : davidbarker
       Score  : 369 points
       Date   : 2024-08-06 17:41 UTC (5 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | gamegoblin wrote:
       | I'm glad they gave up on their "fine-tuning is all you need"
       | approach to structured output. It's possible fine-tuning will
       | work in the long term, but in the short-term, people are trying
       | to build things, and fine-tuning wasn't cutting it.
       | 
       | Surprised it took them so long -- llama.cpp got this feature 1.5
       | years ago (actually an even more general version of it that
       | allows the user to provide any context free grammar, not just
       | JSON schema)
        
         | chhabraamit wrote:
         | How does llama.cpp's grammar adherence work?
         | 
         | Does it keep validating the predicted tokens and backtrack when
         | it's not valid?
        
           | gamegoblin wrote:
           | It's essentially an Earley Parser[0]. It maintains a set of
           | all possible currently valid parses, and zeroes out the
           | probability of any token that isn't valid in at least 1 of
           | the current potential parse trees.
           | 
           | There are contrived grammars you can give it that will make
           | it use exponential memory, but in practice most real-world
           | grammars aren't like this.
           | 
           | [0] https://en.wikipedia.org/wiki/Earley_parser
        
         | tcdent wrote:
         | GPT is still a language model, so at some point it's still just
         | tokens.
         | 
         | Is this just a schema validation layer on their end to avoid
         | the round trip (and cost) of repeating the call?
        
           | gamegoblin wrote:
           | Language models like GPT output a large vector of
           | probabilities for the next token. Then a sampler decides
           | which of those tokens to pick.
           | 
           | The simplest algorithm for getting good quality output is to
           | just always pick the highest probability token.
           | 
           | If you want more creativity, maybe you pick randomly among
           | the top 5 highest probability tokens or something. There are
           | a lot of methods.
           | 
           | All that grammar-constrained decoding does is zero out the
           | probability of any token that would violate the grammar.
        
             | nickreese wrote:
             | Thank you for this explanation. A few things just clicked
             | for me.
        
         | BoorishBears wrote:
         | I was surprised it took so long until I reached this line:
         | 
         | > The model can fail to follow the schema if the model chooses
         | to refuse an unsafe request. If it chooses to refuse, the
         | return message will have the refusal boolean set to true to
         | indicate this.
         | 
         | I'm not sure how they implemented that, maybe they've figured
         | out a way to give the grammar a token or set of tokens that are
         | always valid mid generation and indicate the model would rather
         | not continue generating.
         | 
         | Right now JSON generation is one of the most reliable ways to
         | get around refusals, and they managed not to introduce that
         | weakness into their model
        
         | Der_Einzige wrote:
         | For many things, fine-tuning as we know of it will NEVER fully
         | solve it, there's no hope. Even fine-tuning a model to not use
         | the letter "e" to an overwhelming degree doesn't entirely
         | prevent it, only reduces its chances to increasingly small
         | amounts. Shamesless self plug, and from before the ChatGPT era
         | too! https://paperswithcode.com/paper/most-language-models-can-
         | be...
        
       | leetharris wrote:
       | At the bottom:
       | 
       | >Acknowledgements Structured Outputs takes inspiration from
       | excellent work from the open source community: namely, the
       | outlines, jsonformer, instructor, guidance, and lark libraries.
       | 
       | It is cool to see them acknowledge this, but it's also lame for a
       | company named "OpenAI" to acknowledge getting their ideas from
       | open source, then contributing absolutely NOTHING back to open
       | source with their own implementation.
        
         | warkdarrior wrote:
         | > it's also lame for a company named "OpenAI" to acknowledge
         | getting their ideas from open source, then contributing
         | absolutely NOTHING back to open source with their own
         | implementation
         | 
         | Maybe those projects were used as-is by OpenAI, so there was
         | nothing new to contribute.
        
           | reustle wrote:
           | I think they may be alluding to sponsorships as well as code
           | contributions.
           | 
           | i.e. https://github.com/sponsors/jxnl
        
         | spencerchubb wrote:
         | Is offering gpt4o for free through chatgpt not enough of a
         | contribution? They didn't release source code, but they made a
         | product free to use
        
           | mplewis wrote:
           | No. If it were free you'd be able to use it as a programming
           | API. It's not free and it's not unlimited - it's a time-
           | limited marketing tool.
        
             | spencerchubb wrote:
             | How are you defining the word free?
        
           | talldayo wrote:
           | Free service != Open software
        
           | notarobot123 wrote:
           | This isn't generosity, it's a well known and much used
           | strategy for market penetration. Free until-we-decide-
           | otherwise is very much not the same as open source.
        
             | rvense wrote:
             | In so far as it is a conscious strategy to make it more
             | expensive at a later data, it is actually sort of the
             | opposite of generosity.
        
             | spencerchubb wrote:
             | So if something is free but only temporarily, then that
             | cancels out the generosity? Also, you and I have no idea
             | how long the features will remain free. If anything,
             | chatgpt has been making _more_ features and stronger models
             | free over time.
        
               | simonw wrote:
               | Sometimes it does, yeah. It's not unheard of for
               | companies to deliberately operate at a loss in order to
               | drive out their competition, then raise prices again.
               | This is known as "predatory pricing".
        
           | echelon wrote:
           | That can actually make competition from open source _harder_.
           | New upstarts that are open source can 't compete with free
           | service from OpenAI and can't make money to grow their
           | development or offerings.
           | 
           | OpenAI wants to kill everything that isn't OpenAI.
        
             | ben_w wrote:
             | New open source models* still wouldn't be able to compete
             | even if OpenAI was forcibly shut down.
             | 
             | Hardware's too expensive, and will be for a while, because
             | _all_ the big players are trying to get in on it.
             | 
             | * cue arguments: "'open weights' or training data'?"; "does
             | the Meta offering count or are they being sneaky and
             | evil?"; etc.
        
             | spencerchubb wrote:
             | So should OpenAI make their product less accessible, in
             | order to make it easier for competition? That makes no
             | sense
        
               | oblio wrote:
               | I call chicken. Let them make all their products paid.
               | 
               | Hint: they won't, it would kill their company. The hype
               | around OpenAI is based on people using it for free, at
               | least at the start.
               | 
               | Heck, even drug dealers know this trick!
        
         | sirspacey wrote:
         | You don't anyone will use it to contribute to open source
         | projects?
         | 
         | Seems like an obvious net gain for the community.
        
       | jjcm wrote:
       | Interesting tidbit at the very end that's worth noting for anyone
       | using the API today:
       | 
       | > By switching to the new gpt-4o-2024-08-06, developers save 50%
       | on inputs ($2.50/1M input tokens) and 33% on outputs ($10.00/1M
       | output tokens) compared to gpt-4o-2024-05-13.
        
         | scrollop wrote:
         | From what I've learned from OpenAI, the "latest" "cheaper"
         | model will perform worse than the previous model on various
         | tasks (esp reasoning).
        
           | ralusek wrote:
           | I don't think it's been well enough acknowledged that all of
           | the shortcuts LLMs have been taking with ways of attempting
           | to compress/refine/index the attention mechanism seem to
           | result in dumber models.
           | 
           | GPT 4 Turbo was more like GPT 3.9, and GPT 4o is more like
           | GPT 3.7.
        
             | scrollop wrote:
             | Some commenters acknowledge it - and quantify it:
             | 
             | https://www.youtube.com/watch?v=Tf1nooXtUHE&t=689s
        
             | Der_Einzige wrote:
             | They try to gaslight us and tell us this isn't true because
             | of benchmarks, as though anyone has done anything but do
             | the latent space exploration equivalent of throwing darts
             | at the ocean from space.
             | 
             | It's taken years to get even preliminary reliable decision
             | boundary examples from LLMs because doing so is expensive.
        
           | scrollop wrote:
           | Also, is it a coincidence that at cheaper (potentially
           | faster?) model has been released (just) before they roll out
           | the "new" voice mode (which boasts very low latency)?
        
           | codingwagie wrote:
           | Its usually a distilled smaller model
        
           | samstave wrote:
           | Am I the only one that wants to know 1,000% * _WHY*_ such
           | things?
           | 
           | Is it a natural function of how models evolve?
           | 
           | Is it engineered as such? Why?
           | Marketing/money/resources/what?
           | 
           | WHO makes these decisions and why?
           | 
           | ---
           | 
           | I have been building a thing with Claude 3.5 pro account and
           | its * _utter fn garbage*_ of an experience.
           | 
           | It lies, hallucinates, malevolently changes code that was
           | already told was correct, removes features - explicitly
           | ignore project files. Has no search, no line items, so much
           | screen real-estate is consumed with useless empty space. It
           | ignores states style guides. get CAUGHT forgetting about a
           | premise we were actively working on them condescendingly
           | apologies "oh you're correct - I should have been using XYZ
           | knowledge"
           | 
           | It makes things FN harder to learn.
           | 
           | If I had any claude engineers sitting in the room watching
           | what a POS service it is from a project continuity point...
           | 
           | Its evil. It actively f's up things.
           | 
           | One should have the ability to CHARGE the model token credit
           | when it Fs up so bad.
           | 
           | NO FN SEARCH??? And when asked for line nums in it output -
           | its in txt...
           | 
           | Seriously, I practically want not just a refund, I want
           | claude to pay me for my time correcting its mistakes.
           | 
           | ChatGPT does the same thing. It forgets things committed to
           | memory - refactors successful things back out of files.
           | ETc....
           | 
           | Its been a really eye opening and frustrating experience and
           | my squinty looks are aiming that its specifically
           | intentional:
           | 
           | They dont want people using a $20/month AI plan to actually
           | be able to do any meaningful work and build a product.
        
             | scrollop wrote:
             | Use an API from the top models with a good frontend, then,
             | and use precise instructions.
             | 
             | It's odd, as many people praise claude's coding
             | capabilities.
        
         | minimaxir wrote:
         | The new price is also now reflected on the pricing page:
         | https://openai.com/api/pricing/
         | 
         | It's weird that's only a footnote when it's actually a major
         | shift.
        
           | sjnair96 wrote:
           | I also looked up the same. I wonder why. They must have a
           | subsequent announcement regarding this I'd expect.
        
         | ComputerGuru wrote:
         | If you use the undecorated gpt-4o do you automatically get the
         | latest?
        
           | tedsanders wrote:
           | We'll update gpt-4o in 3 weeks. (We've always updated it
           | couple weeks after launch, so no one is immediately surprised
           | by a new model drop.)
        
           | OutOfHere wrote:
           | For the record, you should never use that in an application.
           | Always explicitly note the full versioned model name. This
           | will prevent bad surprises because not every new version is
           | an improvement; sometimes they get worse, especially at
           | specific tasks.
        
           | voiper1 wrote:
           | >We will give a 3-week notice before updating gpt-4o to point
           | to the new snapshot gpt-4o-2024-08-06.
           | 
           | Source: https://platform.openai.com/docs/models/gpt-4o
        
       | nerdjon wrote:
       | I have a bad feeling that this is just going to introduce more
       | shovelware apps that try to shove AI use in without really
       | understanding what they are going to get back.
       | 
       | Yay I can now ensure the json object will look how I want, but
       | lets completely disregard any concern of wether or not the data
       | returned is valuable.
       | 
       | I don't understand why we are already treating these systems as
       | general purpose AI when they are not. (Ok I do understand it, but
       | it is frustrating).
       | 
       | The example given of "look up all my orders in may of last year
       | that were fulfilled but not delivered on time".
       | 
       | First I have found these models incredibly dumb when it comes to
       | handling time. But even beyond that, if you really are going to
       | do this. I really hope you double check the data before
       | presenting the data you get back as true. Worse that is just
       | double checking what it gives back to you is accurate, not
       | checking that it isn't telling you about something.
       | 
       | Every time I try to experiment with supplying data and asking for
       | data back, they fall flat on their face before we even get to the
       | json being formatted properly. That was not the issue that needed
       | to be solved yet when it still fundamentally messes up the data.
       | Often just returning wrong information. Sometimes it will be
       | right though, but that is the problem. It may luck out and be
       | right enough times that you gain confidence in it and stop double
       | checking what it is giving back to you.
       | 
       | I guarantee you someone is going to have a discussion about using
       | this, feeding it data, and then storing the response in a
       | database.
        
       | titzer wrote:
       | It's so wild that the bar for AI performance is both absurdly
       | high and absurdly low at the same time. To specify an output
       | format (language or grammar) for solving a computational problem
       | is one of the oldest exercises around. On the one hand, it's
       | breathtakingly mundane that the model can now do the most basic
       | of tasks: conform to an output specification. It's weird reading
       | the kind of self-congratulating blogpost about this, like OpenAI
       | has just discovered flint knives. On the other hand, a computer
       | system can process natural language with extremely ambiguous,
       | open-ended problems, compute solutions to said problems, even
       | correct its own mistakes-- _and then_ it can format the output
       | correctly. And then on yet another hand, it only took about 10^25
       | floating point operations (yeah, just ten million trillion
       | trillion, right!?) to get this outcome.
        
         | codingwagie wrote:
         | I think it will take a long time for the world at large to
         | realize and then operationalize the potential of this "mundane"
         | technology. It is revolutionary, and also sitting in plain
         | sight. Such a huge technological shift that was considered
         | decades out only a few years ago
        
           | ben_w wrote:
           | Although I am an optimist* about what this can do, I am very
           | much aware -- from personal experience -- how easy it is to
           | see more than is really there.
           | 
           | The realisation of the tech might be fantastic new things...
           | or it might be that people like me are Clever Hans-ing the
           | models.
           | 
           | * that may be the wrong word; "strong capabilities" is what I
           | think is present, those can be used for ill effects which is
           | pessimistic.
        
         | srcreigh wrote:
         | If I wanted to be a silly pedant, I'd say that Turing machines
         | are language specifications and thus it's theoretically
         | impossible for an LLM or any program to validate output formats
         | in general.
        
           | jes5199 wrote:
           | in _general_ sure, but if you restricted each token to
           | conform to a Kleene-star grammar you should be able to
           | guarantee that you get something that parses according to a
           | context-free grammar
        
         | tommica wrote:
         | For some reason it reminds me of my civilization runs - rush to
         | certain high level tech and then after that discovery writing
         | :D
        
         | thruway516 wrote:
         | I dont understand your complaint at all. If you develop a new
         | revolutionary technology called an automobile, developing
         | steering, brakes, starter, mufflers for it is a pretty big deal
         | even if reins, clamps, mufflers and keys are mundane and have
         | existed for decades. Structured outputs are a pretty big step
         | in making this magic actually usable by developers as opposed
         | to generating impressive cat pictures or whatever has captured
         | the public imagination.
        
           | Bjartr wrote:
           | I don't think it was an complaint, just a observation.
        
             | thruway516 wrote:
             | Yes probably. But considering non-deterministic outputs is
             | the nature of the beast with Llms and we're (mostly)
             | engineers here, calling any part of this mundane sounds
             | almost more like fighting words than just observation
        
               | the8thbit wrote:
               | Extremely pedantic, but is "non-deterministic" really the
               | right language? The same input will always produce the
               | same output, provided you haven't intentionally
               | configured the system to use the model non-
               | deterministically. It seems like the right way to
               | describe it is as a chaotic deterministic system. The
               | same input will always produce the same output, but small
               | shifts in the input or weights can result in dramatic and
               | difficult to predict changes in outputs.
        
               | davedx wrote:
               | Llms are indeed non deterministic
        
               | visarga wrote:
               | > The same input will always produce the same output
               | 
               | Not guaranteed even with the same seed. If you don't
               | perform all operations in exactly the same order, even a
               | simple float32 sum, if batched differently, will result
               | in different final value. This depends on the load factor
               | and how resources are allocated.
        
               | simonw wrote:
               | Yeah, the fact that floating point multiplication isn't
               | associative is a real pain for producing deterministic
               | outputs - especially when you're running massively
               | parallel computations on GPUs (or multiple GPUs) making
               | the order of operations even less predictable.
        
           | jappgar wrote:
           | Structured outputs are hard... but they claimed to have
           | solved this a year ago.
           | 
           | They were lying, of course, and meanwhile charged output
           | tokens for malformed JSON.
        
         | ramraj07 wrote:
         | This is like saying "we shouldn't be celebrating a computer
         | that can talk, my parrot can do that!"
        
         | throwawaymaths wrote:
         | > On the one hand, it's breathtakingly mundane that the model
         | can now do the most basic of tasks: conform to an output
         | specification.
         | 
         | I highly doubt it's the model that does this... It's very
         | likely code injected into the token picker. You could put this
         | into any model all the way down to gpt-2.
        
           | crowcroft wrote:
           | I wonder if you get 90% of the way with prompt engineering,
           | and then the last 10% is just brute force, validate output,
           | if it fails, rerun the prompt.
           | 
           | My assumption is if that's all this is they would have done
           | it a long time ago though.
        
             | jeeceebees wrote:
             | You can just mask the output probabilities for each token
             | based on which options are valid according to a grammar.
             | 
             | There are quite a few open source implementations of this
             | e.g. https://github.com/outlines-dev/outlines
        
               | contravariant wrote:
               | You could simply censor invalid tokens, but that does
               | rely on 2 assumptions.
               | 
               | 1. There is always a valid next token.
               | 
               | 2. This greedy algorithm doesn't result in a
               | qualitatively different distribution from a rejection
               | sampling algorithm.
               | 
               | The latter isn't too obvious, and may in fact be (very)
               | false. Look up maze generation algorithms if you want
               | some feeling for the effects this could have.
               | 
               | If you just want a quick argument, consider what happens
               | if picking the most likely token would increase the
               | chance of an invalid token further down the line to
               | nearly 100%. By the time your token-picking algorithm has
               | any effect it would be too late to fix it.
        
               | throwawaymaths wrote:
               | Sorry, how could there not be a valid next token?
               | Presumably your interface would generate a state machine
               | with appropriate masking arrays, and iirc generally
               | speaking all 256 byte choices are in the token list.
               | There's no way to get stuck in a place where the JSON is
               | invalid? Can you give an example?
               | 
               | If you want to be really clever about your picker, a
               | deterministic result would blat out the all the known
               | possible strings.
               | 
               | For example, if you had an object with defined a defined
               | set of properties, you could just go ahead and not bother
               | generating tokens for all the properties and just
               | tokenize, E.G. `{"foo":"` (6-ish tokens) without even
               | passing through the LLM. As soon as an unescaped `"`
               | arrives, you know the continuation must be `,"bar":"`,
               | for example
               | 
               | > This greedy algorithm doesn't result in a qualitatively
               | different distribution from a rejection sampling
               | algorithm.
               | 
               | It absolutely will. But so will adding an extra newline
               | in your prompt, for example. That sort of thing is part
               | and parcel of how llms work
        
               | contravariant wrote:
               | Hmm, I think any example where it can get stuck is going
               | to be a bit contrived since really it's a question of how
               | easy it is to recognize a valid prefix. Say for example
               | you want the LLM to generate a valid chess match and it
               | ends up in a situation with just 2 kings left. If you're
               | not careful with your definitions you could end up in an
               | endless loop that never ends.
               | 
               | That said if you _know_ all valid prefixes in your
               | language then you can always realise when a token leaves
               | no valid continuations.
               | 
               | > It absolutely will. But so will adding an extra newline
               | 
               | A newline is less likely to dramatically drop the
               | quality, a greedy method could easily end driving itself
               | into a dead end (if not grammatically then semantically).
               | 
               | Say you want it to give a weather prediction consisting
               | of a description followed by a tag 'sunny' or 'cloudy'
               | and your model is on its way to generate
               | {            desc: "Strong winds followed by heavy
               | rainfall.",            tag: "stormy"          }
               | 
               | If it ever gets to the 's' in stormy it will be forced to
               | pick 'sunny', even if that makes no sense in context.
        
             | dilap wrote:
             | You just sample from a grammar and you automatically get
             | 100%; who knows but it seems the most likely thing they are
             | doing. llama.cpp has supported this for a while ( using a
             | BNF-style grammar -- https://github.com/ggerganov/llama.cpp
             | /blob/master/grammars/... )
             | 
             | edit: oh actually, we do sort of know -- they call out
             | jsonformer as an inspiration in the acknowledgements
             | 
             | https://github.com/1rgs/jsonformer
        
               | crowcroft wrote:
               | Oh, thanks for the links. Super interesting!
        
               | senko wrote:
               | Using this in a naive way can easily degenerate into the
               | LLM outputting syntactically/gramatically valid tokens
               | that make no sense, like in this example:
               | https://community.openai.com/t/json-format-causes-
               | infinite-n...
               | 
               | This might be even more pronounced when the output is
               | restricted more using the JSON schema.
               | 
               | So the heavy lifting here was most likely to align the
               | model to avoid/minimize such outcomes, not in tweaking
               | the token sampler.
        
               | dilap wrote:
               | Isn't your example showing an issue w/ the opposite
               | approach, where someone is getting bad output w/ an
               | earlier openAI json mode that worked via training rather
               | than mechanical output restriction to conform to a
               | schema?
               | 
               | FWIW (not too much!) I have used llama.cpp grammars to
               | restrict to specific formats (not particular json, but an
               | expected format), fine-tuned phi2 models, and I didn't
               | hit any issues like this.
               | 
               | I am not intuitively seeing why restricting sampling to
               | tokens matching a schema would cause the LLM to converge
               | on valid tokens that make no sense...
               | 
               | Are there examples of this happening w/ people using e.g.
               | jsonformer?
        
             | throwawaymaths wrote:
             | Yeah but that's hugely wasteful of tokens.
        
         | scarmig wrote:
         | I have struggled writing valid YAML before (my tokenizer
         | doesn't handle whitespace very well). And it probably takes me
         | a quadrillion operations on the reals to get a minimal YAML
         | file (I think your 10^25 fp ops is an overestimate--I think
         | it's more like 10^18-10^19).
         | 
         | It's kind of like an inverse Moravec's paradox.
        
           | theturtle32 wrote:
           | Relatable!!
        
         | m3kw9 wrote:
         | It's doing more, it is allowing user to input using natural
         | language and the output is the json format the API that is
         | defined
        
         | raincole wrote:
         | I don't know, it doesn't sound wild at all to me. Human
         | languages are very imprecise, vague and error-tolerant, which
         | is the opposite of an output format like JSON. So the a model
         | can't do these two things well at the same time is quite an
         | intuitive conclusion.
         | 
         | The wild part is that a model trained with so much human
         | language text can still outputs mostly compilable code.
        
       | wewtyflakes wrote:
       | Why would someone want `strict` to be anything other than `true`?
        
         | davidkunz wrote:
         | Maybe if you can't precisely model your structure with
         | (OpenAI's subset of) JSON schema.
        
         | ComputerGuru wrote:
         | There are many reasons, though I am not sure which _they_ had
         | in mind. One thing is that LLMs in general tend to do better
         | when they can be more verbose in their output and sort of
         | "think aloud" to reach an answer. Insisting on strict output
         | format would rob it of the benefits (because it doesn't just
         | not emit but completely skips those stages, or else you'd be
         | paying for those elided output tokens).
        
           | wewtyflakes wrote:
           | But then why would someone specify that the response has to
           | be in a given JSON schema (by presence of the schema itself),
           | but then also not care if it is actually using that schema
           | (by specifying `strict` as `false`)? That is the use-case I
           | can't wrap my head around.
        
         | tedsanders wrote:
         | We didn't cover this in the announcement post, but there are a
         | few reasons:
         | 
         | - The first request with each JSON schema will be slow, as we
         | need to preprocess the JSON schema into a context-free grammar.
         | If you don't want that latency hit (e.g., you're prototyping,
         | or have a use case that uses variable one-off schemas), then
         | you might prefer "strict": false
         | 
         | - You might have a schema that isn't covered by our subset of
         | JSON schema. (To keep performance fast, we don't support some
         | more complex/long-tail features.)
         | 
         | - In JSON mode and Structured Outputs, failures are rarer but
         | more catastrophic. If the model gets too confused, it can get
         | stuck in loops where it just prints technically valid output
         | forever without ever closing the object. In these cases, you
         | can end up waiting a minute for the request to hit the
         | max_token limit, and you also have to pay for all those useless
         | tokens. So if you have a really tricky schema, and you'd rather
         | get frequent failures back quickly instead of infrequent
         | failures back slowly, you might also want "strict": false
         | 
         | But in 99% of cases, you'll want "strict": true.
        
       | _vaporwave_ wrote:
       | Anyone else catch this reference in one of the examples?
       | 
       | > 9.11 and 9.9 -- which is bigger
       | 
       | https://community.openai.com/t/why-9-11-is-larger-than-9-9-i...
        
         | jodacola wrote:
         | Amusingly, I immediately thought 9.11 - but in the context of a
         | newer version of software. Ever have those moments where you're
         | so deep in context of some ecosystem that you skip right past
         | the basics, like 9.9 being a larger number than 9.11?
        
       | elpocko wrote:
       | Doesn't the BNF grammar approach in llama.cpp solve this issue in
       | a generic way that should work with any model? Why wouldn't they
       | use that?
        
         | ejones wrote:
         | Similar approach to llama.cpp under the hood - they convert the
         | schema to a grammar. Llama.cpp's implementation was specific to
         | the ggml stack, but what they've built sounds similar to
         | Outlines, which they acknowledged.
        
         | HanClinto wrote:
         | llama.cpp's GBNF grammar is generic, and indeed works with any
         | model.
         | 
         | I can't speak for other approaches, but -- while llama.cpp's
         | implementation is nice in that it always generates valid
         | grammars token-by-token (and doesn't require any backtracking),
         | it is tough in that -- in the case of ambiguous grammars (where
         | we're not always sure where we're at in the grammar until it
         | finishes generating), then it keeps all valid parsing option
         | stacks in memory at the same time. This is good for the no-
         | backtracking case, but it adds a (sometimes significant) cost
         | in terms of being rather "explosive" in the memory usage
         | (especially if one uses a particularly large or poorly-formed
         | grammar). Creating a grammar that is openly hostile and crashes
         | the inference server is not difficult.
         | 
         | People have done a lot of work to try and address some of the
         | more egregious cases, but the memory load can be significant.
         | 
         | One example of memory optimization:
         | https://github.com/ggerganov/llama.cpp/pull/6616
         | 
         | I'm not entirely sure what other options there are for
         | approaches to take, but I'd be curious to learn how other
         | libraries (Outlines, jsonformer) handle syntax validation.
        
       | behnamoh wrote:
       | Well, there goes one of the big advantages of open-source
       | models...
       | 
       | For a long time, I was relying on such guaranteed structured
       | outputs as a "secret sauce" that only works using llama.cpp's
       | GBNF grammars. Now OpenAI literally introduced the same concept
       | but a bit more accessible (since you create a JSON and they
       | convert it to a grammar).
       | 
       | Those of you who have used GBNF, do you think it still has any
       | advantage over what OpenAI just announced?
        
         | ejones wrote:
         | FWIW, llama.cpp has always had a JSON schema -> GBNF converter,
         | although it launched as a companion script. Now I think it's
         | more integrated in the CLI and server.
         | 
         | But yeah I mean, GBNF or other structured output solutions
         | would of course allow you to supply formats other than JSON
         | schema. It sounds conceivable though that OpenAI could expose
         | the grammars directly in the future, though.
        
           | behnamoh wrote:
           | I think for certain tasks it's still easier to write the
           | grammar directly. Does converting from JSON to a CFG limit
           | the capabilities of the grammar? i.e., are there things JSON
           | can't represent that a context free grammar can?
        
             | ejones wrote:
             | You might be right that they're similarly powerful. In some
             | cases, an arbitrary output format might in and of itself be
             | desirable. Like it might result in token savings or be more
             | natural for the LLM. For instance, generating code snippets
             | to an API or plain text with constraints.
             | 
             | And this is more esoteric, but technically in the case of
             | JSON I suppose you could embed a grammar inside a JSON
             | string, which I'm not sure JSON schema can express.
        
         | J_Shelby_J wrote:
         | JSON is a sub-set of what GBNF can do, so there are still
         | advantages to that approach. But even GBNF doesn't go far
         | enough. Ever try to restrict a model to a single sentence?
         | 
         | root ::= \" \" item{{{min_count},{max_count}}}
         | 
         | item ::= [A-Z]
         | [^\\\r\\\n\\\x0b\\\x0c\\\x85\\\u2028\\\u2029.?!]+ [a-z] (\". \"
         | | \"? \" | \"! \")
         | 
         | This kinda works if you don't mind no abbreviations, but you
         | can't do something like this with JSON grammars afaik.
        
       | enobrev wrote:
       | In a startup I was working on last year, I had a surprisingly
       | good experience with using a json-schema in my prompt. I had to
       | tweak the json response a bit because it was always invalid, but
       | the issue was generally a missing colon or misplaced bracket.
       | Data-wise it stuck to the schema very well, and cleaning up the
       | json was simple enough that we got to zero parsing errors. I
       | believe this was with 3.5.
       | 
       | Sadly, that project was a final (relatively successful) attempt
       | at getting traction before the startup was sold and is no longer
       | live.
       | 
       | Edit: Ouch, are the down-votes disbelief? Annoyance? Not sure
       | what the problem is.
        
       | nichochar wrote:
       | I'm a little confused why you have to specify "strict: true" to
       | get this behavior. It is obviously always desired, I would be
       | surprised for people to ever specify "strict: false". That API
       | design leaves to be desired.
       | 
       | I also learned about constrainted decoding[1], that they give a
       | brief explanation about. This is a really clever technique! It
       | will increase reliability as well as reduce latency (less tokens
       | to pick from) once the initial artifacts are loaded.
       | 
       | [1] https://www.aidancooper.co.uk/constrained-decoding/
        
         | dgellow wrote:
         | Could you develop a bit re: the API? What do you dislike other
         | than the "strict: true"?
        
         | athyuttamre wrote:
         | Hi, I work on the OpenAI API -- structured outputs schemas have
         | limitations (e.g. all fields must be required, no additional
         | properties allowed):
         | https://platform.openai.com/docs/guides/structured-
         | outputs/s....
         | 
         | If your schema is not supported, but you still want to use the
         | model to generate output, you would use `strict: false`.
         | Unfortunately we cannot make `strict: true` the default because
         | it would break existing users. We hope to make it the default
         | in a future API version.
        
           | Der_Einzige wrote:
           | You should also mention that before you had done custom
           | alignment accounting for this feature, that it was an
           | excellent alignment breaker (therefor a big no-no to release
           | too early)
           | 
           | For example, if I ask an LLM to generate social security
           | numbers, it will give the whole "I'm sorry Hal, I can't do
           | that". If I ban all tokens except numbers and hyphens, prior
           | to your "refusal = True" approach, it was guaranteed that
           | even "aligned" models would generate what appeared to be
           | social security numbers.
        
             | ethical_source wrote:
             | And if LLMs can generate plausible social security numbers,
             | our civilization will fall /s
             | 
             | Christ, I hate the AI safety people who brain-damage models
             | so that they refuse to do things trivial to do by other
             | means. Is LLM censorship preventing bad actors from
             | generating social security numbers? Obviously not. THEN WHY
             | DOES DAMAGING AN LLM TO MAKE IT REFUSE THIS TASK MAKE
             | CIVILIZATION BETTER OFF?
             | 
             | History will not be kind to safetyist luddites.
        
               | Terretta wrote:
               | I'm less concerned with the AI teams lobotomizing
               | utility, more concerned with damage to language,
               | particularly redefining the term "safe" to mean something
               | like "what we deem suitable".
               | 
               | That said, when zero "safety" is at stake might be the
               | best time to experiment with how to build and where to
               | put safety latches, for when we get to a point we mean
               | actual safety. I'm even OK with models that default to
               | parental control for practice provided it can be switched
               | off.
        
       | srcreigh wrote:
       | > The tokens that are valid at the beginning of the output
       | include things like {, {", {\n, etc. However, once the model has
       | already sampled {"val, then { is no longer a valid token
       | 
       | Oops, this is incorrect. {"val{":2} is valid json.
       | 
       | (modulo iOS quotes lol)
        
         | jhgg wrote:
         | Valid JSON, sure, but that key does not conform to the schema
         | provided in the example. The LLM must generate valid JSON that
         | _also_ conforms to the provided schema.
        
       | simonw wrote:
       | The price decrease is particularly notable because it represents
       | a 50% cut in the price to handle image inputs, across any OpenAI
       | model.
       | 
       | Previously image inputs on GPT-4o-mini were priced the SAME as
       | GPT-4o, so using mini wouldn't actually save you any money on
       | image analysis.
       | 
       | This new gpt-4o-2024-08-06 model is 50% cheaper than both GPT-4o
       | AND GPT-4o-mini for image inputs, as far as I can tell.
       | 
       | UPDATE: I may be wrong about this. The pricing calculator for
       | image inputs on https://openai.com/api/pricing/ doesn't indicate
       | any change in price for the new model.
        
         | minimaxir wrote:
         | The calculator doesn't account for the fact that there are now
         | two different prices in a given price matrix.
        
         | jeffharris wrote:
         | yep image input on the new model is also 50% cheaper
         | 
         | and apologies for the outdated pricing calculator ... we'll be
         | updating it later today
        
       | cvhc wrote:
       | I wonder why the top level has to be an object instead of an
       | array... I have some pretty normal use cases where I expect the
       | model to extract a list of objects from the text.
       | 
       | ``` openai.BadRequestError: Error code: 400 - {'error':
       | {'message': 'Invalid schema for response_format
       | \'PolicyStatements\': schema must be a JSON Schema of \'type:
       | "object"\', got \'type: "array"\'.', 'type':
       | 'invalid_request_error', 'param': 'response_format', 'code':
       | None}} ```
       | 
       | I know I can always put the array into a single-key object but
       | it's just so annoying I also have to modify the prompts
       | accordingly to accomodate this.
        
         | manquer wrote:
         | I can't say for OpenAI, but in general I have seen and used
         | this design pattern to keep consistency of root object output
         | and remove a lot of unnecessary validations and branching flows
         | 
         | Otherwise you will to handle the scenarios in code everywhere
         | if you don't know if the root is object or array. If the root
         | has a key that confirms to a known schema then validation
         | becomes easier to write for that scenario,
         | 
         | Similar reasons to why so many APIs wrap with a key like
         | 'data', 'value' or 'error' all responses or in RESTful HTTP
         | endpoints collection say GET /v1/my-object endpoints do no mix
         | with resource URIs GET /v1/my-object/1 the former is always an
         | array the latter is always an object.
        
         | tomComb wrote:
         | Well, this wouldn't be a very satisfying explanation, but these
         | JSON objects are often represented as Python dictionaries and
         | those can't have top level arrays.
        
         | heliophobicdude wrote:
         | Back in the old days, top level arrays were a security risk
         | because the array constructor in JS could be redefined and do
         | bad-guy stuff. I cannot think of any json parsing clients that
         | are vulnerable to this.
        
         | moritzwarhier wrote:
         | It's a relatively common convention for JSON APIs.
         | 
         | Possible reasons:
         | 
         | - Extensibility without breaking changes
         | 
         | - Forcing an object simplifies parsing of API responses,
         | ideally the key should describe the contents, like additional
         | metadata. It also simplifies validation, if considered separate
         | from parsing
         | 
         | - Forcing the root of the API response to be an object makes
         | sure that there is a single entry point into consuming it.
         | There is no way to place non-descript heterogenous data items
         | next to each other
         | 
         | - Imagine that you want to declare types (often generated from
         | JSON schemas) for your API responses. That means you should
         | refrain from placing different types, or a single too broad
         | type in an array. Arrays should be used in a similar way to
         | stricter languages, and not contain unexpected types. A top-
         | level array invites dumping unspecified data to the client that
         | is expensive and hard to process
         | 
         | - The blurry line between arrays and objects in JS does not
         | cleanly map to other languages, not even very dynamic ones like
         | PHP or Python. I'm aware that JSON and JS object literals are
         | not the same. But even the JSON subset of JS (apart from number
         | types, where it's not a subset AFAIK) already creates
         | interesting edge cases for serialization and deserialization
        
         | simonw wrote:
         | I've regretted designing APIs that return an array rather than
         | an object in the past.
         | 
         | It's all about the extensibility. If you return an object you
         | can add extra keys, for things like "an error occurred, here
         | are the details", or "this is truncated, here's how to paginate
         | it", or a logs key for extra debug messages, or information
         | about the currently authenticated user.
         | 
         | None of those are possible if the root is an array.
        
       | __jl__ wrote:
       | There is another big change in gpt-4o-2024-08-06: It supports 16k
       | output tokens compared to 4k before. I think it was only
       | available in beta before. So gpt-4o-2024-08-06 actually brings
       | three changes. Pretty significant for API users
       | 
       | 1. Reliable structured outputs 2. Reduced costs by 50% for input,
       | 33% for output 3. Up to 16k output tokens compared to 4k
       | 
       | https://platform.openai.com/docs/models/gpt-4o
        
         | Culonavirus wrote:
         | That's actually pretty impressive... if they didn't dumb it
         | down that is, which only time will tell.
        
         | santiagobasulto wrote:
         | I've noticed that lately GPT has gotten more and more verbose.
         | I'm wondering if it's a subtle way to "raise prices", as the
         | average response is going to incur I more tokens, which makes
         | any API conversation to keep growing in tokens of course (each
         | IN message concatenates the previous OUT messages).
        
           | sashank_1509 wrote:
           | they also spend more to generate more tokens. The more
           | obvious reason is it seems like people rate responses better
           | the longer they are. Lmsys demonstrated that GPT tops the
           | leaderboard because it tends to give much longer and more
           | detailed answers, and it seems like OpenAI is optimizing or
           | trying to maximize lmsys.
        
           | throwaway48540 wrote:
           | It's a subtle way to make it smarter. Making it write out the
           | "thinking process" and decisions has always helped with
           | reliability and quality.
        
           | tedsanders wrote:
           | GPT has indeed been getting more verbose, but revenue has
           | zero bearing on that decision. There's always a tradeoff
           | here, and we do our imperfect best to pick a default that
           | makes the most people happy.
           | 
           | I suspect the reason why most big LLMs have ended up in a
           | pretty verbose spot is that it's easier for users to scroll &
           | skim than to ask follow-up questions.
           | 
           | With regard to this new gpt-4o model: you'll find it actually
           | bucks the recent trend and is less verbose than its
           | predecessor.
        
           | sophiabits wrote:
           | I've especially noticed this with gpt-4o-mini [1], and it's a
           | big problem. My particular use case involves keeping a
           | running summary of a conversation between a user and the LLM,
           | and 4o-mini has a really bad tendency of inventing details in
           | order to hit the desired summary word limit. I didn't see
           | this with 4o or earlier models
           | 
           | Fwiw my subjective experience has been that non-technical
           | stakeholders tend to be more impressed with / agreeable to
           | longer AI outputs, regardless of underlying quality. I have
           | lost count of the number of times I've been asked to make
           | outputs longer. Maybe this is just OpenAI responding to what
           | users want?
           | 
           | [1] https://sophiabits.com/blog/new-llms-arent-always-
           | better#exa...
        
       | surfingdino wrote:
       | Is it still NTSAT (Never The Same Answer Twice)?
        
         | H8crilA wrote:
         | Yes, this happens by design.
        
           | oblio wrote:
           | Interesting, why? Is there no theoretical way to have stable
           | models? Or some kind of executive decision?
        
       | H8crilA wrote:
       | How is this different from function calling?
        
         | binarymax wrote:
         | Function calling uses JSON mode. While it has been mostly
         | correct, I do get an incorrectly formatted response sometimes
         | (maybe 1 in 10k requests?). So it sounds like this fixes that
         | bug.
        
         | tedsanders wrote:
         | Under the hood, it's quite similar to function calling. A few
         | differences:
         | 
         | - Structured Outputs is a bit more straightforward. e.g., you
         | don't have to pretend you're writing a function where the
         | second arg could be a two-page report to the user, and then
         | pretend the "function" was called successfully by returning
         | {"success": true}
         | 
         | - Having two interfaces lets us teach the model different
         | default behaviors and styles, depending on which you use
         | 
         | - Another difference is that our current implementation of
         | function calling can return both a text reply plus a function
         | call (e.g., "Let me look up that flight for you"), whereas
         | Structured Outputs will only return the JSON
         | 
         | (I worked on this feature at OpenAI.)
        
           | technics256 wrote:
           | How can we enable the text reply with a function call?
           | Usually the message returned is a tool call only when it
           | calls a tool?
        
             | tedsanders wrote:
             | There's no special interface, but you can write an
             | instruction in a system message in the first position.
             | E.g., "Before each function call, explain to the user what
             | you're about to do." It's not super reliable, but the model
             | can do it. Few-shot prompting might help as well.
        
       | paradite wrote:
       | Really important update that was not mentioned:
       | 
       | gpt-4o-2024-08-06 has 16,384 tokens output limit instead of 4,096
       | tokens.
       | 
       | https://platform.openai.com/docs/models/gpt-4o
       | 
       | We don't need the GPT-4o Long Output anymore.
        
         | OutOfHere wrote:
         | But is this also the default or just the max? Is the default 4k
         | or 16k?
         | 
         | Also, the question of the default value applies both at the
         | server level and at the SDK level.
        
         | floam wrote:
         | Long Output is 64K though.
        
       | gdiamos wrote:
       | We've had this for over 1 year in Lamini - https://lamini-
       | ai.github.io/inference/json_output/
       | 
       | Works with any open LLM, including Llama 3.1
        
         | radarsat1 wrote:
         | Looks useful!
        
         | AStrangeMorrow wrote:
         | Also the outlines library: https://github.com/outlines-
         | dev/outlines
        
           | gdiamos wrote:
           | Yeah! - outlines, guidance, jsonformer were inspiring for
           | this line of work
        
           | HanClinto wrote:
           | Also note llama.cpp with grammar support:
           | 
           | https://github.com/ggerganov/llama.cpp/tree/master/grammars
           | 
           | Supports an EBNF-like syntax, as well as JSON-Schema.
        
         | msoad wrote:
         | Why not JSON Schema?
        
           | gdiamos wrote:
           | We did some user studies and found that people found it less
           | intuitive.
        
       | zoogeny wrote:
       | Totally tangential, totally not related to the post (unless you
       | squint your eyes and really blur things) ...
       | 
       | I was thinking about the old canard of the sufficiently smart
       | compiler. It made me think about LLM output and how in some way
       | the output of a LLM could be bytecode as much as it could be the
       | English language. You have a tokenized input and the translated
       | output. You have a massive and easily generatable training set. I
       | wonder if, one day, our compilers will be LLMs?
        
         | jcims wrote:
         | You definitely could, not far removed from text to image or
         | text to audio generators.
        
         | pjc50 wrote:
         | Why would you tolerate a nonreliable compiler with no assured
         | relationship between its inputs and its outputs? Have people
         | just got too comfortable with the C++ model of "UB means I can
         | insert a security bug for you"?
        
           | bigyikes wrote:
           | In a hypothetical future where the reliability of LLMs
           | improves, I can imagine the model being able to craft
           | optimizations that a traditional compiler cannot.
           | 
           | Like there are already cases where hand-rolling assembly can
           | eke out performance gains, but few do that because it's so
           | arduous. If the LLM could do it reliably it'd be a huge win.
           | 
           | It's a big if, but not outside the realm of possibility.
        
             | zoogeny wrote:
             | I agree it is currently a pipe dream. But if I was looking
             | for a doctoral research idea, it might be fun to work on
             | something like that.
             | 
             | Lots of potential avenues to explore, e.g. going from a
             | high-level language to some IR, from some IR to bytecode,
             | or straight from high-level to machine code.
             | 
             | I mean, -O3 is already so much of a black box that I can't
             | understand it. And the tedium of hand optimizing massive
             | chunks of code is why we automate it at all. Boredom is
             | something we don't expect LLMs to suffer, so having one
             | pore over some kind of representation and apply
             | optimizations seems totally reasonable. And if it had some
             | kinds of "emergent behaviors" based on intelligence that
             | allow it to beat the suite of algorithmic optimization we
             | program into compilers, it could actually be a benefit.
        
         | killthebuddha wrote:
         | A function that implements natural language -> bytecode is IMO
         | way more likely to be under the hood an LLM _operating a
         | compiler_ (or maybe a compiler operating LLMs) rather than a
         | "bare" LLM. From an end user's perspective maybe it won't
         | matter but I think it's an important technical point. IMO
         | there's no evidence that an LLM will ever be the best way to
         | execute general purpose computations.
        
         | thih9 wrote:
         | I guess an actual compiler would be cheaper and more reliable.
         | 
         | In theory we could do the same with mathematical computations,
         | 2+2=4 and the like; but computing the result seems easier.
        
       | adagradschool wrote:
       | While text and image generation are getting cheaper at a
       | significant rate, audio still seems to be just as expensive with
       | ElevenLabs. I wonder why it is so.
        
       | say_it_as_it_is wrote:
       | This puts like a dozen popular python libraries out of business
        
         | AStrangeMorrow wrote:
         | At least depends on the approach and use: stuff like outlines
         | (https://github.com/outlines-dev/outlines) that actually
         | changes the sampling to adhere to a grammar and can be used
         | with local/custom models shouldn't be too impacted. Those are
         | not really used on top of openAI models
        
       | sansseriff wrote:
       | Preprocessing new schema takes 'under 10 seconds'. That's... a
       | huge range? Unless the preprocessing time is a small fraction of
       | the inference time, I don't see the point.
       | 
       | I'm working on an app that dynamically generates schema based on
       | user input (a union of arbitrary types pulled from a library).
       | The resulting schema is often in the 800 token range. Curious how
       | long that would take to preprocess
        
       | pton_xd wrote:
       | Isn't "we hardcoded JSON into the latest model" kind of the
       | opposite direction, strategically, from "we're on the way to AGI
       | and I need 7 trillion to get there?"
        
         | isoprophlex wrote:
         | You are witnessing the final stages in the evolution of OpenAI
         | from a messianic hype machine to Yet Another Product Company.
         | 
         | Hence all the people leaving, too.
        
           | gardenhedge wrote:
           | I am ootl, employees are leaving openai?
        
             | dangrossman wrote:
             | > John Schulman, one of the co-founders of artificial
             | intelligence company OpenAI, has left the ChatGPT maker for
             | rival Anthropic, he said in a post on social media platform
             | X late Monday.
             | 
             | > OpenAI's President and co-founder Greg Brockman is also
             | taking a sabbatical through the end of the year, he said in
             | a X post late Monday.
             | 
             | > Peter Deng, a vice-president of product, also left in
             | recent months, a spokesperson said. And earlier this year,
             | several members of the company's safety teams exited.
             | 
             | That's after co-founder and Chief Scientist Ilya Sutskever
             | left in May.
        
               | oblio wrote:
               | Are there any co-founders left?
        
               | sashank_1509 wrote:
               | sam Altman for one.
        
         | KaiMagnus wrote:
         | Yeah, definitely a way to end up with a Siri like mess if you
         | do this long enough. The use case is there and it's going to be
         | very useful, but the magic is wearing off.
        
       | irgolic wrote:
       | Wasn't this already fully supported in the tool calling API?
        
       | ramoz wrote:
       | Can someone explain how this is different/better than the current
       | state of function calling (which I've been using to get a
       | consistent json schema response without issue)?
        
         | mrshu wrote:
         | This post (from an OpenAI researcher) contains a bit more
         | background: https://news.ycombinator.com/item?id=41174213
        
         | jacobsimon wrote:
         | For starters, the naming is much less confusing. But the
         | behavior also appears to be enforced/validated at some layer
         | (hopefully?), which function calling did not seem to be. I was
         | experimenting with it a couple weeks ago and it would work like
         | 75% of the time but would often give me invalid results for
         | schemas with relatively simple nested objects.
        
         | zbyforgotp wrote:
         | This is guaranteed, function calling without it is not. The old
         | way can work for you, but my experience is different,
         | especially with complex schemas.
        
       | Der_Einzige wrote:
       | Now the question is when they will support soft constraints like
       | this: https://huggingface.co/blog/constrained-beam-search
        
       | agtech_andy wrote:
       | I have had a lot of success using BoundaryML
       | (https://www.boundaryml.com/) for this. They have also been super
       | responsive for any of my questions.
        
         | aaronvg wrote:
         | thanks for the shoutout, we benchmarked our approach against
         | other function-calling techniques and we've been able to beat
         | all other approaches every time (even by 8%!) just by getting
         | better at parsing the data and representing schemas with less
         | tokens using type definitions instead of json schema.
         | 
         | You can take a look at our BFCL results on that site or the
         | github: https://github.com/BoundaryML/baml
         | 
         | We'll be publishing our comparison against OpenAI structured
         | outputs in the next 2 days, and a deeper dive into our results,
         | but we aim to include this kind of constrained generation as a
         | capability in the BAML DSL anyway longterm!
        
       | tarofchaos wrote:
       | Two years too late. I think we are going through a bozo period at
       | OpenAI where small things are being highlighted as achievements.
        
       | damsta wrote:
       | Can we get something like that in Gemini 1.5 Flash?
        
       | mugivarra69 wrote:
       | cohere had this like a while ago
        
       | MattDaEskimo wrote:
       | "We have ripped code from a bunch of open-source variations and
       | slapped it behind our brutally abstracted API.
       | 
       | Interoperable with other external models like the open source
       | versions? What, are you mad?"
        
       | jumploops wrote:
       | By using JSON mode, GPT-4{o} has been able to do this reliably
       | for months (100k+ calls).
       | 
       | We use GPT-4o to build dynamic UI+code[0], and almost all of our
       | calls are using JSON mode. Previously it mostly worked, but we
       | had to do some massaging on our end (backtick removal, etc.).
       | 
       | With that said, this will be great for GPT-4o-mini, as it often
       | struggles/forgets to format things as we ask.
       | 
       | Note: we haven't had the same success rate with function calling
       | compared to pure JSON mode, as the function calling seems to add
       | a level of indirection that can reduce the quality of the LLMs
       | output YMMV.
       | 
       | Anyhow, excited for this!
       | 
       | [0]https://magicloops.dev
        
         | qwertox wrote:
         | What a cool product! I was about to recommend you to submit it
         | as a "Show HN", but it turns out that it already got submitted
         | one year ago.
         | 
         | Would you mind sharing a bit on how things have evolved?
        
           | jumploops wrote:
           | Thanks and great question :)
           | 
           | When we first launched, the tool was very manual; you had to
           | generate each step via the UI. We then added a "Loop Creator
           | agent" that now builds Loops for you without intervention.
           | Over the past few months we've mostly been fixing feature
           | gaps and improving the Loop Creator.
           | 
           | Based on recent user feedback, we've put a few things in
           | motion:
           | 
           | - Form generator (for manual loops)
           | 
           | - Chrome extension (for local automations)
           | 
           | - In-house Google Sheets integration
           | 
           | - Custom outputs (charts, tables, etc.)
           | 
           | - Custom Blocks (shareable with other users)
           | 
           | With these improvements, you'll be able to create "single
           | page apps" like this one I made for my wife's annual mango
           | tasting party[0].
           | 
           | In addition to those features, we're also launching a new
           | section for Loop templates + educational content/how-tos, in
           | an effort to help people get started.
           | 
           | To be super candid, the Loop Creator has been a pain. We
           | started at an 8% success rate and we're only just now at 25%.
           | Theoretically we should be able to hit 80%+ based on existing
           | loop requests, but we're running into limits with the current
           | state of LLMs.
           | 
           | [0]https://mangota.ngo
        
             | gleb wrote:
             | Where do you get such a large variety of mangoes?
        
               | tomcam wrote:
               | Asking the important questions
        
               | jumploops wrote:
               | My mother-in-law is the President of the Central Florida
               | Fruit Society, and is in charge of sourcing mangoes for
               | their annual party. She sends us all the excess mangoes
               | :)
               | 
               | As I understand it, this year's mangoes mostly came from
               | Merritt Island, as there was some not-so-great weather in
               | southern Florida.
        
         | tomcam wrote:
         | Very interesting. Did you build magicloops using this tech?
        
           | jumploops wrote:
           | We first built Magic Loops with GPT-4, about a year ago, well
           | before JSON mode was a thing.
           | 
           | We had to a do a bunch of extra prompting to make it work, as
           | GPT would often include backticks or broken JSON (most
           | commonly extra commas). At the time, YAML was a much better
           | approach.
           | 
           | Thankfully we've been able to remove most of these hacks, but
           | we still use a best effort JSON parser[0] to help stream
           | partial UI back to the client.
           | 
           | [0]https://www.npmjs.com/package/best-effort-json-parser
        
         | diego_sandoval wrote:
         | Can I use Magic Loops to generate Magic Loops for me?
        
       | msp26 wrote:
       | Is the JSON actually being fed into the LLM's context or is it
       | still being converted into typescript?
       | 
       | The previous setup didn't allow for custom types, only
       | objects/string/num/bool.
       | 
       | Are the enums put into context or purely used for constrained
       | sampling?
        
       | OutOfHere wrote:
       | Using this feature will obviously "lock you in" to OpenAI,
       | specifically to this model too, at least until other companies
       | catch on. While text prompts can more easily be moved to other
       | LLMs, this feature cannot currently be ported as such. I would
       | use it only if a text prompt is insufficient despite retries.
        
         | BoorishBears wrote:
         | It's already supported by multiple other providers. Fireworks,
         | Together, probably more.
        
         | dtquad wrote:
         | OpenAI-style JSON mode and function calling rapidly became the
         | industry standard way of doing it. It will probably also happen
         | for this feature.
        
           | toomuchtodo wrote:
           | "S3 compatible"
        
         | PufPufPuf wrote:
         | This feature has existed for quite some time in several
         | inference libraries, like Outlines, under the names
         | "constrained decoding" or "guided decoding". Some even include
         | it in their OpenAI-compatible API in a very similar form
         | (allowing to pass in a JSON Schema). All this required doing
         | your own inference, though -- so the announcement really just
         | brings this popular feature "to the masses".
        
         | moralestapia wrote:
         | OTOH, not using it could "lock you out" of building a cool
         | product for your users, so ...
        
         | faizshah wrote:
         | The converse API in AWS bedrock lets you use function calling
         | across a number of different providers (doesn't support
         | OpenAI):
         | https://docs.aws.amazon.com/bedrock/latest/userguide/convers...
         | 
         | I have been using it so that my agents aren't specific to a
         | particular model or api.
         | 
         | Like others have said many other providers already have
         | function calling and json schema for structure outputs.
        
       | jappgar wrote:
       | It's nice that they're not making me pay for broken json anymore
       | but touting this as a "feature" is laughable.
       | 
       | It's a bug fix. They should never have been charging for
       | malformed responses in the first place!
        
       | LAC-Tech wrote:
       | Good to see JSON Schema being more widely adopted. I remember
       | doing a project a few years ago in XML just because XML schemas
       | were everywhere and JSON ones were still not really used.
        
       | Brosper wrote:
       | I think they would like to have something like artifacts in
       | Claude
        
       ___________________________________________________________________
       (page generated 2024-08-06 23:00 UTC)