You don't need a new *model*. The trick of the technique is that you only change...

You don't need a new model. The trick of the technique is that you only change how tokens are sampled; Zero out the probability of every token that would be illegal under the grammar or other constraints.

All you need for that is an inference API that gives you the full output vector, which is trivial for any model you run on your own hardware.