[HN Gopher] Structured outputs on the Claude Developer Platform
___________________________________________________________________
Structured outputs on the Claude Developer Platform
Author : adocomplete
Score : 170 points
Date : 2025-11-14 19:04 UTC (1 days ago)
(HTM) web link (www.claude.com)
(TXT) w3m dump (www.claude.com)
| barefootford wrote:
| I switched from structured outputs on OpenAI apis to unstructured
| on Claude (haiku 4.5) and haven't had any issues (yet). But
| guarantees are always nice.
| jascha_eng wrote:
| I feel like this is so core to any LLM automation it was crazy
| that anthropic is only adding it now.
|
| I built a customized deep research internally earlier this year
| that is made up of multiple "agentic" steps, each focusing on
| specific information to find. And the outputs of those steps are
| always in json and then the input for the next step. Sure you can
| work you way around failures by doing retries but its just one
| less thing to think about if you can guarantee that the random
| LLM output adheres at least to some sort of structure.
| sails wrote:
| Agree, it feels so fundamental. Any idea why? Gemini has also
| had it for a long time
| crazylogger wrote:
| The way you get structured output with Claude prior to this
| is via tool use.
|
| IMO this was the more elegant design if you think about it:
| _tool calling is really just structured output and structured
| output is tool calling_. The "do not provide multiple ways
| of doing the same thing" philosophy.
| veonik wrote:
| I have had fairly bad luck specifying the JSONSchema for my
| structured outputs with Gemini. It seems like describing the
| schema with natural language descriptions works much better,
| though I do admit to needing that retry hack at times. Do you
| have any tips on getting the most out of a schema definition?
| BoorishBears wrote:
| Always have a top level object for one.
|
| But also Gemini supports contrained generation which can't
| fail to match a schema, so why not use that instead of
| prompting?
| astrange wrote:
| Constrained generation makes models somewhat less
| intelligent. Although it shouldn't be an issue in thinking
| mode, since it can prepare an unconstrained response and
| then fix it up.
| BoorishBears wrote:
| I mean that's too reductionist if you're being exact and
| not a worry if you're not.
|
| Even asking for JSON (without constrained sampling)
| sometimes degrades output, but also even the name and
| order of keys can affect performance or even act as
| structured thinking.
|
| At the end of the day current models have enough problems
| with generalization that they should establish a baseline
| and move from there.
| Der_Einzige wrote:
| Not true and citation needed. Whatever you cite there are
| competing papers claiming that structured and constrained
| generation does zero harm to output diversity/creativity
| (within a schema).
| simonw wrote:
| Prior to this it was possible to get the same effect by
| defining a tool with the schema that you wanted and then
| telling the Anthropic API to always use that tool.
|
| I implemented structured outputs for Claude that way here:
| https://github.com/simonw/llm-anthropic/blob/500d277e9b4bec6...
| fnordsensei wrote:
| Same, but it's a PITA when you also want to support tool
| calling at the same time. Had to do a double call: call and
| check if it will use tools. If not, call again and force the
| use of the (now injected) return schema tool.
| mparis wrote:
| We've been running structured outputs via Claude on Bedrock
| in production for a year now and it works great. Give it a
| JSON schema, inject a '{', and sometimes do a bit of custom
| parsing on the response. GG
|
| Nice to see them support it officially; however, OpenAI has
| officially supported this for a while but, at least
| historically, I have been unable to use it because it adds
| deterministic validation that errors on certain standard JSON
| Schema elements that we used. The lack of "official" support
| is the feature that pushed us to use Claude in the first
| place.
|
| It's unclear to me that we will need "modes" for these
| features.
|
| Another example: I used to think that I couldn't live without
| Claude Code "plan mode". Then I used Codex and asked it to
| write a markdown file with a todo list. A bit more typing but
| it works well and it's nice to be able to edit the plan
| directly in editor.
|
| Agree or Disagree?
| Karrot_Kream wrote:
| Before Claude Code shipped with plan mode, the workflow for
| using most coding agents was to have it create a `PLAN.md`
| and update/execute that plan. Planning mode was just a
| first class version of what users were already doing.
| kami23 wrote:
| Claude Code keeps coming out with a lot of really nice
| tools that others haven't started to emulate from what I've
| seen.
|
| My favorite one is going through the plan interactively. It
| turns it into a multiple choice / option TUI and the last
| choose is always reprompt that section of the plan.
|
| I had switch back to codex recently and not being able to
| do my planning solely in the CLI feels like the early
| 1900s.
|
| To trigger the interactive mode. Do something like:
|
| Plan a fix for:
|
| <Problem statement>
|
| Please walk me through any options or questions you might
| have interactively.
| tempaccount420 wrote:
| > Give it a JSON schema, inject a '{', and sometimes do a
| bit of custom parsing on the response
|
| I would hope that this is not what OpenAI/Anthropic do
| under the hood, because otherwise, what if one of the
| strings needs a lot of \escapes? Is it also supposed to
| newer write actual newlines in strings? It's awkward.
|
| The ideal solution would be to have some special tokens
| like [object_start] [object_end] and [string_start]
| [string_end].
| koakuma-chan wrote:
| I don't think the tool input schema thing does that
| inference-time trick. I think it just dumps the JSON schema
| into the context, and tells the model to conform to that
| schema.
| alex_duf wrote:
| It's not 100% success, I've had responses that didn't match
| my schema.
|
| I think the new feature goes on to limit which token can be
| output, which brings a guarantee, where the tools are a
| suggestion.
| miki123211 wrote:
| So, so much this.
|
| Structured outputs are the most underappreciated LLM feature.
| If you're building anything except a chatbot, it's definitely
| worth familiarizing yourself without them.
|
| They're not too easy to use well, and there aren't that much
| resources on the internet explaining how to get the most out of
| them you can.
| maleldil wrote:
| In Python, they're very easy to use. Define your schema with
| Pydantic and pass the class to your client calls. There are
| some details to know (eg field order can affect performance),
| but it's very easy overall. Other languages probably have
| something similar.
| swyx wrote:
| and they've done super well without it. makes you really
| question if this is really that core.
| andai wrote:
| It's nice but I don't know how necessary it is.
|
| You could get this working very consistently with GPT-4 in mid
| 2023. The version before June, iirc. No JSON output, no tool
| calling fine tuning... just half a page of instructions and
| some string matching code. (Built a little AI code editing tool
| along these lines.)
|
| With the tool calling RL and structured outputs, I think the
| main benefit is peace of mind. You know you're going down the
| happy path, so there's one less thing to worry about.
|
| Reliability is the final frontier!
| macNchz wrote:
| Using structured outputs pretty extensively for a while now,
| my impression has been that the newer models take less of a
| quality hit while conforming to a specific schema. Just
| giving instructions and output examples totally worked,
| however it came at a considerable cost of quality in the
| output. My impression is that this effect has diminished over
| time with models that have been more explicitly trained to
| produce them.
| mkagenius wrote:
| I always wondered how they achieved this - is it just retries
| while generating tokens and as soon as they find mismatch - they
| retry? Or the model itself is trained extremely well in this
| version of 4.5?
| Kuinox wrote:
| The inference doesn't return a single token, but the probably
| for all tokens. You just select the token that is allowed
| according to the compiler.
| mkagenius wrote:
| Hmm, wouldn't it sacrifice a better answer in some cases (not
| sure how many though)?
|
| I'll be surprised if they hadn't specifically trained for
| structured "correct" output for this, in addition to picking
| next token following the structure.
| Kuinox wrote:
| The "better answer" wouldnt had respected the schema in
| this case.
| tdfirth wrote:
| In my experience (I've put hundreds of billions of tokens
| through structured outputs over the last 18 months), I
| think the answer is yes, but only in edge cases.
|
| It generally happens when the grammar is highly
| constrained, for example if a boolean is expected next.
|
| If the model assigns a low probability to both true and
| false coming next, then the sampling strategy will pick
| whichever one happens to score highest. Most tokens have
| very similar probabilities close to 0 most of the time, and
| if you're picking between two of these then the result will
| often feel random.
|
| It's always the result of a bad prompt though, if you
| improve the prompt so that the model understands the task
| better, then there will then be a clear difference in the
| scores the tokens get, and so it seems less random.
| miki123211 wrote:
| It's not just the prompt that matters, it's also field
| order (and a bunch of other things).
|
| Imagine you're asking your model to give you a list of
| tasks mentioned in a meeting, along with a boolean
| indicating whether the task is done. If you put the
| boolean first, the model must decide both what the task
| is and whether it is done at the same time. If you put
| the task description first, the model can separate that
| work into two distinct steps.
|
| There are more tricks like this. It's really worth
| thinking about which calculations you delegate to the
| model and which you do in code, and how you integrate the
| two.
| mirekrusin wrote:
| Sampling is already constrained with temperature, top_k,
| top_p, top_a, typical_p, min_p, entropy_penalty, smoothing
| etc. - filtering tokens to valid ones according to grammar
| is just yet another alternative. It does make sense and can
| be used for producing programming language output as well -
| what's the point in generating/bothering with up front
| know, invalid output? Better to filter it out and allow
| valid completions only.
| mmoskal wrote:
| Grammars work best when aligned with prompt. That is, if
| your prompt gives you the right format of answer 80% of the
| time, the grammar will take you to a 100%. If it gives you
| the right answer 1% of the time, the grammar will give you
| syntactically correct garbage.
| simonw wrote:
| They're using the same trick OpenAI have been using for a
| while: they compile a grammar and then have that running as
| part of token inference, such that only tokens that fit the
| grammar are selected as the next-token.
|
| This trick has also been in llama.cpp for a couple of years:
| https://til.simonwillison.net/llms/llama-cpp-python-grammars
| huevosabio wrote:
| Yea, and now there are mature OSS solutions with outlines and
| xgrammar, so it makes even more weird that only now do we
| have this supported by Anthropic.
| minimaxir wrote:
| More info on Claude's grammar compiling:
| https://docs.claude.com/en/docs/build-with-
| claude/structured...
| causal wrote:
| I reaaaaally wish we could provide an EBNF grammar like
| llama.cpp. JSON Schema has much fewer use cases for me.
| psadri wrote:
| What are some examples that you can't express in json
| schema?
| causal wrote:
| Anything not JSON
| jawiggins wrote:
| How sure are you that OpenAI is using that?
|
| I would have suspected it too, but I've been struggling with
| OpenAI returning syntactically invalid JSON when provided
| with a simple pydantic class (a list of strings), which
| shouldn't be possible unless they have a glaring error in
| their grammar.
| gradys wrote:
| You might be using JSON mode, which doesn't guarantee a
| schema will be followed, or structured outputs not in
| strict mode. It is possible to get the property that the
| response is either a valid instance of the schema or an
| error (eg for refusal)
| jawiggins wrote:
| How do you activate strict mode when using pydantic
| schemas? It doesn't look like that is a valid parameter
| to me.
|
| No, I don't get refusals, I see literally invalid json,
| like: `{"field": ["value...}`
| koakuma-chan wrote:
| https://github.com/guidance-ai/llguidance
|
| > 2025-05-20 LLGuidance shipped in OpenAI for JSON Schema
| mmoskal wrote:
| OpenAI is using [0] LLGuidance [1]. You need to set
| strict:true in your request for schema validation to kick
| in though.
|
| [0] https://platform.openai.com/docs/guides/function-
| calling#lar... [1] https://github.com/guidance-
| ai/llguidance
| jawiggins wrote:
| I don't think that parameter is an option when using
| pydantic schemas.
|
| class FooBar(BaseModel): foo: list[str] bar: list[int]
|
| prompt = """#Task Your job is to reply with Foo Bar, a
| json object with foo, a list of strings, and bar, a list
| of ints """
|
| response = openai_client.chat.completions.parse(
| model="gpt-5-nano-2025-08-07", messages=[{"role":
| "system", "content": FooBar}],
| max_completion_tokens=4096, seed=123,
| response_format=CommentAnalysis, strict=True )
|
| TypeError: Completions.parse() got an unexpected keyword
| argument 'strict'
| simonw wrote:
| You have to explicitly opt into it by passing strict=True
| https://platform.openai.com/docs/guides/structured-
| outputs/s...
| jawiggins wrote:
| Are you able to use `strict=True` when using pydantic
| models? It doesn't seem to be valid for me. I think that
| only works for json schemas.
|
| class FooBar(BaseModel): foo: list[str] bar: list[int]
|
| prompt = """#Task Your job is to reply with Foo Bar, a
| json object with foo, a list of strings, and bar, a list
| of ints """
|
| response = openai_client.chat.completions.parse(
| model="gpt-5-nano-2025-08-07", messages=[{"role":
| "system", "content": FooBar}],
| max_completion_tokens=4096, seed=123,
| response_format=CommentAnalysis, strict=True )
|
| > TypeError: Completions.parse() got an unexpected
| keyword argument 'strict'
| xpe wrote:
| This makes me wonder if there are cases where one would want
| the LLM to generate a syntactically invalid response (which
| could be identified as such) rather than guarantee syntactic
| validity at the potential cost of semantic accuracy.
| jmathai wrote:
| I remember using Claude and including the start of the expected
| JSON output in the request to get the remainder in the response.
| I couldn't believe that was an actual recommendation from the
| company to get structured responses.
|
| Like, you'd end your prompt like this: 'Provide the response in
| JSON: {"data":'
| samuelknight wrote:
| That's what I thought when starting and it functions so poorly
| that I think they should remove it from their docs. You can
| enforce a schema by creating a tool definition with json in the
| exact shape you want the output, then set "tool_choice" to
| "any". They have a picture that helps.
|
| https://docs.claude.com/en/docs/agents-and-tools/tool-use/im...
|
| Unfortunately it doesn't support the full JSON schema. You
| can't union or do other things you would expect. It's
| manageable since you can just create another tool for it to
| chose from that fits another case.
| luke_walsh wrote:
| makes sense
| huevosabio wrote:
| Whoa I always thought that tool use was Anthropics way for
| structured outputs. Can't believe only now are they supporting
| this.
| igor47 wrote:
| Curious if they're planning to support more complicated schemas.
| They claim to support JSON schema, but I found it only accepts
| flat schemas and not, for example, unions or discriminated
| unions. I've had to flatten some of my schemas to be able to
| define tool for them.
| radial_symmetry wrote:
| About time, how did it take them so long?
| adidoit wrote:
| One reason I haven't used Haiku in production at Socratify it's
| the lack of structured output so I hope they'll add it to Haiku
| 4.5 soon.
|
| It's a bit weird it took Anthropic so long considering it's been
| ages since OpenAI and Google did it I know you could do it
| through tool calling but that always just seemed like a bit of a
| hack to me
| jcheng wrote:
| Seems almost quaint in late 2025 to object to a workable
| technique because it "seemed like a bit of a hack"!
| adidoit wrote:
| Fair enough! I was getting a failure rate on that was worse
| then OpenAI and Google and the developer semantics didn't
| really work for me.
| jawiggins wrote:
| So cool to see Anthropic support this feature. I'm a heavy user
| of the OpenAI version, however they seem to have a bug where
| frequently the model will return a string that is not
| syntactically valid json, leading the OpenAI client to raise a
| ValidationError when trying to construct the pydantic model.
| Curious if anyone else here has experienced this? I would have
| expected the implementation to prevent this, maybe using a state
| machine to only allow the model to pick syntactically valid
| tokens. Hopefully Anthropic took a different approach that
| doesn't have this issue.
| brianyu8 wrote:
| Brian on the OpenAI API team here. I would love to help you get
| to the bottom of the structured outputs issues you're seeing.
| Mind sending me some more details about your schema / prompt or
| any request IDs you might have to by[at]openai.com?
| jawiggins wrote:
| Thanks so much for reaching out, sent an email :).
| matheist wrote:
| yeah I have, but I think only when it gets stuck in a loop and
| outputs a (for example) array that goes on forever. a truncated
| array is obviously not valid JSON. but it'd be hard to miss
| that if you're looking at the outputs.
| robot-wrangler wrote:
| https://github.com/pydantic/pydantic-ai/issues/582
| https://github.com/pydantic/pydantic-ai/issues/2405
| causal wrote:
| Shocked this wasn't already a feature. Bummed they only seem to
| have JSON Schema and not something more flexible like BNF
| grammar's, like llama.cpp has for a long time:
| https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...
| cma wrote:
| If they every gave really finegrained constraints you could
| constrain to subsets of tokens and extract the logits a lot
| cheaper than by random sampling limited to a few top choices
| and distill claude at a much deeper level. I wonder if that
| plays into some of the restrictions.
| causal wrote:
| That makes sense, and if that's the reason it's another vote
| for open models
| __mharrison__ wrote:
| My playing around with structured output on OpenAI leads me to
| believe that hardly anyone is using this, or the documentation
| was horrible. Luckily, they accept Pydantic models, but the idea
| of manually writing a JSON schema (what the docs teach first) is
| mind-bending.
|
| Anthropic seems to be following suit.
|
| (I'm probably just bitter because they owe me $50K+ for stealing
| my books).
| asdev wrote:
| it's also really slow to use structured outputs. mainly makes
| sense for offline use cases
| dcre wrote:
| For the TS devs, Zod introduced a new toJSONSchema() method in
| v4 that makes this very easy.
|
| https://zod.dev/json-schema
| jumploops wrote:
| Curious if they've built their own library for this or if they're
| using the same one as OpenAI[0].
|
| A quick look at the llguidance repo doesn't show any signs of
| Anthropic contributors, but I do see some from OpenAI and
| ByteDance Seed.
|
| [0]https://github.com/guidance-ai/llguidance
| gogasca wrote:
| Google ADK framework with schema output and Gemini is already
| supported for a while
| mulmboy wrote:
| Along with a bunch of limitations that make it useless for
| anything but trivial use cases
| https://docs.claude.com/en/docs/build-with-claude/structured...
|
| I've found structured output APIs to be a pain across various
| LLMs. Now I just ask for json output and pick it out between
| first/last curly brace. If validation fails just retry with
| details about why it was invalid. This works very reliably for
| complex schemas and works across all LLMs without having to think
| about limitations.
|
| And then you can add complex pydantic validators (or whatever, I
| use pydantic) with super helpful error messages to be fed back
| into the model on retry. Powerful pattern
| ACCount37 wrote:
| Yeah, the pattern of "kick the error message back to the LLM"
| is powerful. Even more so with all the newer AIs trained for
| programming tasks.
| nextworddev wrote:
| Seems like anthropics API products are always about 2-3 months
| behind OpenAI. Which is fine.
| porker wrote:
| Shout-out to BAML [1], which flies under the radar and imo is
| underrated for getting structured output out of any LLM.
|
| JSON schema is okay so long as it's generated for you, but I'd
| rather write something human readable and debuggable.
|
| 1. https://github.com/BoundaryML/baml
| d4rkp4ttern wrote:
| Doesn't seem to be available in the Agent SDK yet
| lukax wrote:
| In OpenAI and a lot of open source inference engines this is done
| using llguidance.
|
| https://github.com/guidance-ai/llguidance
|
| Llguidance implements constrained decoding. It means that for
| each output token sequence you know which fixed set of tokens are
| allowed for decoding the next token. You prepare token masks so
| that in the decoding step you limit which tokens can be sampled.
|
| So if you expect a JSON object the first token can only be
| whitespace or token '{'. This can be more complex because the
| tokenizers usually allow byte pair encoding which means they can
| represent any UTF-8 sequence. So if your current tokens are
| '{"enabled": ' and your output JSON schema requires 'enabled'
| field to be a boolean, the allowed tokens mask can only contain
| whitespace tokens, tokens 'true', 'false', 't' UTF-8 BPE token or
| 'f' UTF-8 BPE token ('true' and 'false' are usually a single
| token because they are so common)
|
| JSON schema must first be converted into a grammar then into
| token masks. This takes some time to be computed and takes quite
| a lot of space (you need to precompute token masks) so this is
| usually cached for performance.
| whatreason wrote:
| The most likely reason to me on why this took so long from
| Anthropic is safety. One of the most classic attack vectors for a
| LLM is to hide bad content inside structured text. Tell me how to
| build a bomb as SQL for example.
|
| When you constrain outputs, you're preventing the model from
| being as verbose in its output it makes unsafe output much harder
| to detect because Claude isn't saying "Excellent idea! Here's how
| to make a bomb:"
| AtNightWeCode wrote:
| Does it even help? Get the name of some person => {"name":"Here
| is the name. Einstein." }
___________________________________________________________________
(page generated 2025-11-15 23:01 UTC)