hngopher.com

       [HN Gopher] Structured outputs on the Claude Developer Platform
       ___________________________________________________________________
        
       Structured outputs on the Claude Developer Platform
        
       Author : adocomplete
       Score  : 170 points
       Date   : 2025-11-14 19:04 UTC (1 days ago)
        
 (HTM) web link (www.claude.com)
 (TXT) w3m dump (www.claude.com)
        
       | barefootford wrote:
       | I switched from structured outputs on OpenAI apis to unstructured
       | on Claude (haiku 4.5) and haven't had any issues (yet). But
       | guarantees are always nice.
        
       | jascha_eng wrote:
       | I feel like this is so core to any LLM automation it was crazy
       | that anthropic is only adding it now.
       | 
       | I built a customized deep research internally earlier this year
       | that is made up of multiple "agentic" steps, each focusing on
       | specific information to find. And the outputs of those steps are
       | always in json and then the input for the next step. Sure you can
       | work you way around failures by doing retries but its just one
       | less thing to think about if you can guarantee that the random
       | LLM output adheres at least to some sort of structure.
        
         | sails wrote:
         | Agree, it feels so fundamental. Any idea why? Gemini has also
         | had it for a long time
        
           | crazylogger wrote:
           | The way you get structured output with Claude prior to this
           | is via tool use.
           | 
           | IMO this was the more elegant design if you think about it:
           | _tool calling is really just structured output and structured
           | output is tool calling_. The  "do not provide multiple ways
           | of doing the same thing" philosophy.
        
         | veonik wrote:
         | I have had fairly bad luck specifying the JSONSchema for my
         | structured outputs with Gemini. It seems like describing the
         | schema with natural language descriptions works much better,
         | though I do admit to needing that retry hack at times. Do you
         | have any tips on getting the most out of a schema definition?
        
           | BoorishBears wrote:
           | Always have a top level object for one.
           | 
           | But also Gemini supports contrained generation which can't
           | fail to match a schema, so why not use that instead of
           | prompting?
        
             | astrange wrote:
             | Constrained generation makes models somewhat less
             | intelligent. Although it shouldn't be an issue in thinking
             | mode, since it can prepare an unconstrained response and
             | then fix it up.
        
               | BoorishBears wrote:
               | I mean that's too reductionist if you're being exact and
               | not a worry if you're not.
               | 
               | Even asking for JSON (without constrained sampling)
               | sometimes degrades output, but also even the name and
               | order of keys can affect performance or even act as
               | structured thinking.
               | 
               | At the end of the day current models have enough problems
               | with generalization that they should establish a baseline
               | and move from there.
        
               | Der_Einzige wrote:
               | Not true and citation needed. Whatever you cite there are
               | competing papers claiming that structured and constrained
               | generation does zero harm to output diversity/creativity
               | (within a schema).
        
         | simonw wrote:
         | Prior to this it was possible to get the same effect by
         | defining a tool with the schema that you wanted and then
         | telling the Anthropic API to always use that tool.
         | 
         | I implemented structured outputs for Claude that way here:
         | https://github.com/simonw/llm-anthropic/blob/500d277e9b4bec6...
        
           | fnordsensei wrote:
           | Same, but it's a PITA when you also want to support tool
           | calling at the same time. Had to do a double call: call and
           | check if it will use tools. If not, call again and force the
           | use of the (now injected) return schema tool.
        
           | mparis wrote:
           | We've been running structured outputs via Claude on Bedrock
           | in production for a year now and it works great. Give it a
           | JSON schema, inject a '{', and sometimes do a bit of custom
           | parsing on the response. GG
           | 
           | Nice to see them support it officially; however, OpenAI has
           | officially supported this for a while but, at least
           | historically, I have been unable to use it because it adds
           | deterministic validation that errors on certain standard JSON
           | Schema elements that we used. The lack of "official" support
           | is the feature that pushed us to use Claude in the first
           | place.
           | 
           | It's unclear to me that we will need "modes" for these
           | features.
           | 
           | Another example: I used to think that I couldn't live without
           | Claude Code "plan mode". Then I used Codex and asked it to
           | write a markdown file with a todo list. A bit more typing but
           | it works well and it's nice to be able to edit the plan
           | directly in editor.
           | 
           | Agree or Disagree?
        
             | Karrot_Kream wrote:
             | Before Claude Code shipped with plan mode, the workflow for
             | using most coding agents was to have it create a `PLAN.md`
             | and update/execute that plan. Planning mode was just a
             | first class version of what users were already doing.
        
             | kami23 wrote:
             | Claude Code keeps coming out with a lot of really nice
             | tools that others haven't started to emulate from what I've
             | seen.
             | 
             | My favorite one is going through the plan interactively. It
             | turns it into a multiple choice / option TUI and the last
             | choose is always reprompt that section of the plan.
             | 
             | I had switch back to codex recently and not being able to
             | do my planning solely in the CLI feels like the early
             | 1900s.
             | 
             | To trigger the interactive mode. Do something like:
             | 
             | Plan a fix for:
             | 
             | <Problem statement>
             | 
             | Please walk me through any options or questions you might
             | have interactively.
        
             | tempaccount420 wrote:
             | > Give it a JSON schema, inject a '{', and sometimes do a
             | bit of custom parsing on the response
             | 
             | I would hope that this is not what OpenAI/Anthropic do
             | under the hood, because otherwise, what if one of the
             | strings needs a lot of \escapes? Is it also supposed to
             | newer write actual newlines in strings? It's awkward.
             | 
             | The ideal solution would be to have some special tokens
             | like [object_start] [object_end] and [string_start]
             | [string_end].
        
           | koakuma-chan wrote:
           | I don't think the tool input schema thing does that
           | inference-time trick. I think it just dumps the JSON schema
           | into the context, and tells the model to conform to that
           | schema.
        
           | alex_duf wrote:
           | It's not 100% success, I've had responses that didn't match
           | my schema.
           | 
           | I think the new feature goes on to limit which token can be
           | output, which brings a guarantee, where the tools are a
           | suggestion.
        
         | miki123211 wrote:
         | So, so much this.
         | 
         | Structured outputs are the most underappreciated LLM feature.
         | If you're building anything except a chatbot, it's definitely
         | worth familiarizing yourself without them.
         | 
         | They're not too easy to use well, and there aren't that much
         | resources on the internet explaining how to get the most out of
         | them you can.
        
           | maleldil wrote:
           | In Python, they're very easy to use. Define your schema with
           | Pydantic and pass the class to your client calls. There are
           | some details to know (eg field order can affect performance),
           | but it's very easy overall. Other languages probably have
           | something similar.
        
         | swyx wrote:
         | and they've done super well without it. makes you really
         | question if this is really that core.
        
         | andai wrote:
         | It's nice but I don't know how necessary it is.
         | 
         | You could get this working very consistently with GPT-4 in mid
         | 2023. The version before June, iirc. No JSON output, no tool
         | calling fine tuning... just half a page of instructions and
         | some string matching code. (Built a little AI code editing tool
         | along these lines.)
         | 
         | With the tool calling RL and structured outputs, I think the
         | main benefit is peace of mind. You know you're going down the
         | happy path, so there's one less thing to worry about.
         | 
         | Reliability is the final frontier!
        
           | macNchz wrote:
           | Using structured outputs pretty extensively for a while now,
           | my impression has been that the newer models take less of a
           | quality hit while conforming to a specific schema. Just
           | giving instructions and output examples totally worked,
           | however it came at a considerable cost of quality in the
           | output. My impression is that this effect has diminished over
           | time with models that have been more explicitly trained to
           | produce them.
        
       | mkagenius wrote:
       | I always wondered how they achieved this - is it just retries
       | while generating tokens and as soon as they find mismatch - they
       | retry? Or the model itself is trained extremely well in this
       | version of 4.5?
        
         | Kuinox wrote:
         | The inference doesn't return a single token, but the probably
         | for all tokens. You just select the token that is allowed
         | according to the compiler.
        
           | mkagenius wrote:
           | Hmm, wouldn't it sacrifice a better answer in some cases (not
           | sure how many though)?
           | 
           | I'll be surprised if they hadn't specifically trained for
           | structured "correct" output for this, in addition to picking
           | next token following the structure.
        
             | Kuinox wrote:
             | The "better answer" wouldnt had respected the schema in
             | this case.
        
             | tdfirth wrote:
             | In my experience (I've put hundreds of billions of tokens
             | through structured outputs over the last 18 months), I
             | think the answer is yes, but only in edge cases.
             | 
             | It generally happens when the grammar is highly
             | constrained, for example if a boolean is expected next.
             | 
             | If the model assigns a low probability to both true and
             | false coming next, then the sampling strategy will pick
             | whichever one happens to score highest. Most tokens have
             | very similar probabilities close to 0 most of the time, and
             | if you're picking between two of these then the result will
             | often feel random.
             | 
             | It's always the result of a bad prompt though, if you
             | improve the prompt so that the model understands the task
             | better, then there will then be a clear difference in the
             | scores the tokens get, and so it seems less random.
        
               | miki123211 wrote:
               | It's not just the prompt that matters, it's also field
               | order (and a bunch of other things).
               | 
               | Imagine you're asking your model to give you a list of
               | tasks mentioned in a meeting, along with a boolean
               | indicating whether the task is done. If you put the
               | boolean first, the model must decide both what the task
               | is and whether it is done at the same time. If you put
               | the task description first, the model can separate that
               | work into two distinct steps.
               | 
               | There are more tricks like this. It's really worth
               | thinking about which calculations you delegate to the
               | model and which you do in code, and how you integrate the
               | two.
        
             | mirekrusin wrote:
             | Sampling is already constrained with temperature, top_k,
             | top_p, top_a, typical_p, min_p, entropy_penalty, smoothing
             | etc. - filtering tokens to valid ones according to grammar
             | is just yet another alternative. It does make sense and can
             | be used for producing programming language output as well -
             | what's the point in generating/bothering with up front
             | know, invalid output? Better to filter it out and allow
             | valid completions only.
        
             | mmoskal wrote:
             | Grammars work best when aligned with prompt. That is, if
             | your prompt gives you the right format of answer 80% of the
             | time, the grammar will take you to a 100%. If it gives you
             | the right answer 1% of the time, the grammar will give you
             | syntactically correct garbage.
        
         | simonw wrote:
         | They're using the same trick OpenAI have been using for a
         | while: they compile a grammar and then have that running as
         | part of token inference, such that only tokens that fit the
         | grammar are selected as the next-token.
         | 
         | This trick has also been in llama.cpp for a couple of years:
         | https://til.simonwillison.net/llms/llama-cpp-python-grammars
        
           | huevosabio wrote:
           | Yea, and now there are mature OSS solutions with outlines and
           | xgrammar, so it makes even more weird that only now do we
           | have this supported by Anthropic.
        
           | minimaxir wrote:
           | More info on Claude's grammar compiling:
           | https://docs.claude.com/en/docs/build-with-
           | claude/structured...
        
           | causal wrote:
           | I reaaaaally wish we could provide an EBNF grammar like
           | llama.cpp. JSON Schema has much fewer use cases for me.
        
             | psadri wrote:
             | What are some examples that you can't express in json
             | schema?
        
               | causal wrote:
               | Anything not JSON
        
           | jawiggins wrote:
           | How sure are you that OpenAI is using that?
           | 
           | I would have suspected it too, but I've been struggling with
           | OpenAI returning syntactically invalid JSON when provided
           | with a simple pydantic class (a list of strings), which
           | shouldn't be possible unless they have a glaring error in
           | their grammar.
        
             | gradys wrote:
             | You might be using JSON mode, which doesn't guarantee a
             | schema will be followed, or structured outputs not in
             | strict mode. It is possible to get the property that the
             | response is either a valid instance of the schema or an
             | error (eg for refusal)
        
               | jawiggins wrote:
               | How do you activate strict mode when using pydantic
               | schemas? It doesn't look like that is a valid parameter
               | to me.
               | 
               | No, I don't get refusals, I see literally invalid json,
               | like: `{"field": ["value...}`
        
             | koakuma-chan wrote:
             | https://github.com/guidance-ai/llguidance
             | 
             | > 2025-05-20 LLGuidance shipped in OpenAI for JSON Schema
        
             | mmoskal wrote:
             | OpenAI is using [0] LLGuidance [1]. You need to set
             | strict:true in your request for schema validation to kick
             | in though.
             | 
             | [0] https://platform.openai.com/docs/guides/function-
             | calling#lar... [1] https://github.com/guidance-
             | ai/llguidance
        
               | jawiggins wrote:
               | I don't think that parameter is an option when using
               | pydantic schemas.
               | 
               | class FooBar(BaseModel): foo: list[str] bar: list[int]
               | 
               | prompt = """#Task Your job is to reply with Foo Bar, a
               | json object with foo, a list of strings, and bar, a list
               | of ints """
               | 
               | response = openai_client.chat.completions.parse(
               | model="gpt-5-nano-2025-08-07", messages=[{"role":
               | "system", "content": FooBar}],
               | max_completion_tokens=4096, seed=123,
               | response_format=CommentAnalysis, strict=True )
               | 
               | TypeError: Completions.parse() got an unexpected keyword
               | argument 'strict'
        
             | simonw wrote:
             | You have to explicitly opt into it by passing strict=True
             | https://platform.openai.com/docs/guides/structured-
             | outputs/s...
        
               | jawiggins wrote:
               | Are you able to use `strict=True` when using pydantic
               | models? It doesn't seem to be valid for me. I think that
               | only works for json schemas.
               | 
               | class FooBar(BaseModel): foo: list[str] bar: list[int]
               | 
               | prompt = """#Task Your job is to reply with Foo Bar, a
               | json object with foo, a list of strings, and bar, a list
               | of ints """
               | 
               | response = openai_client.chat.completions.parse(
               | model="gpt-5-nano-2025-08-07", messages=[{"role":
               | "system", "content": FooBar}],
               | max_completion_tokens=4096, seed=123,
               | response_format=CommentAnalysis, strict=True )
               | 
               | > TypeError: Completions.parse() got an unexpected
               | keyword argument 'strict'
        
           | xpe wrote:
           | This makes me wonder if there are cases where one would want
           | the LLM to generate a syntactically invalid response (which
           | could be identified as such) rather than guarantee syntactic
           | validity at the potential cost of semantic accuracy.
        
       | jmathai wrote:
       | I remember using Claude and including the start of the expected
       | JSON output in the request to get the remainder in the response.
       | I couldn't believe that was an actual recommendation from the
       | company to get structured responses.
       | 
       | Like, you'd end your prompt like this: 'Provide the response in
       | JSON: {"data":'
        
         | samuelknight wrote:
         | That's what I thought when starting and it functions so poorly
         | that I think they should remove it from their docs. You can
         | enforce a schema by creating a tool definition with json in the
         | exact shape you want the output, then set "tool_choice" to
         | "any". They have a picture that helps.
         | 
         | https://docs.claude.com/en/docs/agents-and-tools/tool-use/im...
         | 
         | Unfortunately it doesn't support the full JSON schema. You
         | can't union or do other things you would expect. It's
         | manageable since you can just create another tool for it to
         | chose from that fits another case.
        
       | luke_walsh wrote:
       | makes sense
        
       | huevosabio wrote:
       | Whoa I always thought that tool use was Anthropics way for
       | structured outputs. Can't believe only now are they supporting
       | this.
        
       | igor47 wrote:
       | Curious if they're planning to support more complicated schemas.
       | They claim to support JSON schema, but I found it only accepts
       | flat schemas and not, for example, unions or discriminated
       | unions. I've had to flatten some of my schemas to be able to
       | define tool for them.
        
       | radial_symmetry wrote:
       | About time, how did it take them so long?
        
       | adidoit wrote:
       | One reason I haven't used Haiku in production at Socratify it's
       | the lack of structured output so I hope they'll add it to Haiku
       | 4.5 soon.
       | 
       | It's a bit weird it took Anthropic so long considering it's been
       | ages since OpenAI and Google did it I know you could do it
       | through tool calling but that always just seemed like a bit of a
       | hack to me
        
         | jcheng wrote:
         | Seems almost quaint in late 2025 to object to a workable
         | technique because it "seemed like a bit of a hack"!
        
           | adidoit wrote:
           | Fair enough! I was getting a failure rate on that was worse
           | then OpenAI and Google and the developer semantics didn't
           | really work for me.
        
       | jawiggins wrote:
       | So cool to see Anthropic support this feature. I'm a heavy user
       | of the OpenAI version, however they seem to have a bug where
       | frequently the model will return a string that is not
       | syntactically valid json, leading the OpenAI client to raise a
       | ValidationError when trying to construct the pydantic model.
       | Curious if anyone else here has experienced this? I would have
       | expected the implementation to prevent this, maybe using a state
       | machine to only allow the model to pick syntactically valid
       | tokens. Hopefully Anthropic took a different approach that
       | doesn't have this issue.
        
         | brianyu8 wrote:
         | Brian on the OpenAI API team here. I would love to help you get
         | to the bottom of the structured outputs issues you're seeing.
         | Mind sending me some more details about your schema / prompt or
         | any request IDs you might have to by[at]openai.com?
        
           | jawiggins wrote:
           | Thanks so much for reaching out, sent an email :).
        
         | matheist wrote:
         | yeah I have, but I think only when it gets stuck in a loop and
         | outputs a (for example) array that goes on forever. a truncated
         | array is obviously not valid JSON. but it'd be hard to miss
         | that if you're looking at the outputs.
        
         | robot-wrangler wrote:
         | https://github.com/pydantic/pydantic-ai/issues/582
         | https://github.com/pydantic/pydantic-ai/issues/2405
        
       | causal wrote:
       | Shocked this wasn't already a feature. Bummed they only seem to
       | have JSON Schema and not something more flexible like BNF
       | grammar's, like llama.cpp has for a long time:
       | https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...
        
         | cma wrote:
         | If they every gave really finegrained constraints you could
         | constrain to subsets of tokens and extract the logits a lot
         | cheaper than by random sampling limited to a few top choices
         | and distill claude at a much deeper level. I wonder if that
         | plays into some of the restrictions.
        
           | causal wrote:
           | That makes sense, and if that's the reason it's another vote
           | for open models
        
       | __mharrison__ wrote:
       | My playing around with structured output on OpenAI leads me to
       | believe that hardly anyone is using this, or the documentation
       | was horrible. Luckily, they accept Pydantic models, but the idea
       | of manually writing a JSON schema (what the docs teach first) is
       | mind-bending.
       | 
       | Anthropic seems to be following suit.
       | 
       | (I'm probably just bitter because they owe me $50K+ for stealing
       | my books).
        
         | asdev wrote:
         | it's also really slow to use structured outputs. mainly makes
         | sense for offline use cases
        
         | dcre wrote:
         | For the TS devs, Zod introduced a new toJSONSchema() method in
         | v4 that makes this very easy.
         | 
         | https://zod.dev/json-schema
        
       | jumploops wrote:
       | Curious if they've built their own library for this or if they're
       | using the same one as OpenAI[0].
       | 
       | A quick look at the llguidance repo doesn't show any signs of
       | Anthropic contributors, but I do see some from OpenAI and
       | ByteDance Seed.
       | 
       | [0]https://github.com/guidance-ai/llguidance
        
       | gogasca wrote:
       | Google ADK framework with schema output and Gemini is already
       | supported for a while
        
       | mulmboy wrote:
       | Along with a bunch of limitations that make it useless for
       | anything but trivial use cases
       | https://docs.claude.com/en/docs/build-with-claude/structured...
       | 
       | I've found structured output APIs to be a pain across various
       | LLMs. Now I just ask for json output and pick it out between
       | first/last curly brace. If validation fails just retry with
       | details about why it was invalid. This works very reliably for
       | complex schemas and works across all LLMs without having to think
       | about limitations.
       | 
       | And then you can add complex pydantic validators (or whatever, I
       | use pydantic) with super helpful error messages to be fed back
       | into the model on retry. Powerful pattern
        
         | ACCount37 wrote:
         | Yeah, the pattern of "kick the error message back to the LLM"
         | is powerful. Even more so with all the newer AIs trained for
         | programming tasks.
        
       | nextworddev wrote:
       | Seems like anthropics API products are always about 2-3 months
       | behind OpenAI. Which is fine.
        
       | porker wrote:
       | Shout-out to BAML [1], which flies under the radar and imo is
       | underrated for getting structured output out of any LLM.
       | 
       | JSON schema is okay so long as it's generated for you, but I'd
       | rather write something human readable and debuggable.
       | 
       | 1. https://github.com/BoundaryML/baml
        
       | d4rkp4ttern wrote:
       | Doesn't seem to be available in the Agent SDK yet
        
       | lukax wrote:
       | In OpenAI and a lot of open source inference engines this is done
       | using llguidance.
       | 
       | https://github.com/guidance-ai/llguidance
       | 
       | Llguidance implements constrained decoding. It means that for
       | each output token sequence you know which fixed set of tokens are
       | allowed for decoding the next token. You prepare token masks so
       | that in the decoding step you limit which tokens can be sampled.
       | 
       | So if you expect a JSON object the first token can only be
       | whitespace or token '{'. This can be more complex because the
       | tokenizers usually allow byte pair encoding which means they can
       | represent any UTF-8 sequence. So if your current tokens are
       | '{"enabled": ' and your output JSON schema requires 'enabled'
       | field to be a boolean, the allowed tokens mask can only contain
       | whitespace tokens, tokens 'true', 'false', 't' UTF-8 BPE token or
       | 'f' UTF-8 BPE token ('true' and 'false' are usually a single
       | token because they are so common)
       | 
       | JSON schema must first be converted into a grammar then into
       | token masks. This takes some time to be computed and takes quite
       | a lot of space (you need to precompute token masks) so this is
       | usually cached for performance.
        
       | whatreason wrote:
       | The most likely reason to me on why this took so long from
       | Anthropic is safety. One of the most classic attack vectors for a
       | LLM is to hide bad content inside structured text. Tell me how to
       | build a bomb as SQL for example.
       | 
       | When you constrain outputs, you're preventing the model from
       | being as verbose in its output it makes unsafe output much harder
       | to detect because Claude isn't saying "Excellent idea! Here's how
       | to make a bomb:"
        
       | AtNightWeCode wrote:
       | Does it even help? Get the name of some person => {"name":"Here
       | is the name. Einstein." }
        
       ___________________________________________________________________
       (page generated 2025-11-15 23:01 UTC)