There are some implementation concerns, but the real answer is that it is an ide...

ijk · 2025-09-23T14:18:04 1758637084

One pattern that I've seen develop (in PydanticAI and elsewhere) is to constrain the output but include an escape hatch. If an error happens, that lets it bail out and report the problem rather than be forced to proceed down a doomed path.

nl · 2025-09-23T14:14:15 1758636855

Most API providers (Together, Fireworks etc) don't build their own models.

2THFairy · 2025-09-23T16:15:10 1758644110

You don't need a new model. The trick of the technique is that you only change how tokens are sampled; Zero out the probability of every token that would be illegal under the grammar or other constraints.

All you need for that is an inference API that gives you the full output vector, which is trivial for any model you run on your own hardware.

ijk · 2025-09-23T14:19:03 1758637143

Though Fireworks is one of the few providers that supports structured generation.