This is a great writeup! There was a period where reliable structured output was a significant differentiator and was the 'secret sauce' behind some companies success. A NL->SQL company I am familiar with comes to mind. Nice to see this both public and supported by a growing ecosystem of libraries.
One statement surprised me was that the author thinks "models over time will just be able to output JSON perfectly without the need for constraining over time."
I'm not sure how this conclusion was reached. "Perfectly" is a bar that probabilistic sampling cannot meet.
Thank you! Maybe not "perfect" but near-perfect is something we can expect. Models like the Osmosis structure which just structure data inspired some of that thinking (https://ollama.com/Osmosis/Osmosis-Structure-0.6B). Historically, JSON generation has been a latent capability of a model rather than a trained one, but that seems to be changing. gpt-oss was particularly trained for this type of behavior and so the token probabilities are heavily skewed to conform to JSON. Will be interesting to see the next batch of models!
You're spot on about the "perfect" JSON bar being unreachable for now. The only consistently reliable method I've seen in the wild is some form of constrained decoding or grammar enforcement—bit brittle, but practical. Sampling will always be fuzzy unless the architecture fundamentally shifts. Anyone claiming zero-validity issues is probably glossing over a ton of downstream QA work.
We’ve had a lot of success implementing schema-aligned parsing in BAML, a DSL that we’ve built to simplify this problem.
We actually don’t like constrained generation as approach - among other issues it limits your ability to use reasoning - and instead the technique we’re using is algorithm-driven error-tolerant output parsing.
One statement surprised me was that the author thinks "models over time will just be able to output JSON perfectly without the need for constraining over time."
I'm not sure how this conclusion was reached. "Perfectly" is a bar that probabilistic sampling cannot meet.