There are some implementation concerns, but the real answer is that it is an ideological choice.
The AI companies believe that these kinds of grammar mistakes will be solved by improving the models. To build out tools for grammar constrained inference like this is to suggest, on some level, that GPT-N+1 won't magically solve the problem.
The deeper level is that it's not just simple grammar constraints. Constraining to JSON is a nice party trick, but it opens the door to further ideas. How about constraining to a programming language's grammar? Those are well defined, you just swap the JSON grammar file for the Java grammar file, job done.
We can go further: Why not use a language server to constrain not only the grammar but also the content? What variables and functions are in-scope is known, constraining a variable reference or function call to one of their names can be done with the same techique as grammar constraints. ("monitor-guided decoding", figured out back in 2023)
Entire classes of hallucination problems can be eliminated this way. The marketing writes itself; "Our AI is literally incapable of making the errors humans make!"
What many AI developers, firms, and especially their leaders find grating about this is the implication. That AI is fallible and has to be constrained.
Another such inconvenience is that while these techniques improve grammar they highlight semantic problems. The code is correct & compiles, it just does the wrong thing.
One pattern that I've seen develop (in PydanticAI and elsewhere) is to constrain the output but include an escape hatch. If an error happens, that lets it bail out and report the problem rather than be forced to proceed down a doomed path.
You don't need a new model. The trick of the technique is that you only change how tokens are sampled; Zero out the probability of every token that would be illegal under the grammar or other constraints.
All you need for that is an inference API that gives you the full output vector, which is trivial for any model you run on your own hardware.
The AI companies believe that these kinds of grammar mistakes will be solved by improving the models. To build out tools for grammar constrained inference like this is to suggest, on some level, that GPT-N+1 won't magically solve the problem.
The deeper level is that it's not just simple grammar constraints. Constraining to JSON is a nice party trick, but it opens the door to further ideas. How about constraining to a programming language's grammar? Those are well defined, you just swap the JSON grammar file for the Java grammar file, job done.
We can go further: Why not use a language server to constrain not only the grammar but also the content? What variables and functions are in-scope is known, constraining a variable reference or function call to one of their names can be done with the same techique as grammar constraints. ("monitor-guided decoding", figured out back in 2023)
Entire classes of hallucination problems can be eliminated this way. The marketing writes itself; "Our AI is literally incapable of making the errors humans make!"
What many AI developers, firms, and especially their leaders find grating about this is the implication. That AI is fallible and has to be constrained.
Another such inconvenience is that while these techniques improve grammar they highlight semantic problems. The code is correct & compiles, it just does the wrong thing.