I think you might be over-simplifying. This (and llama.cpp's grammar-based sampl...

I think you might be over-simplifying. This (and llama.cpp's grammar-based sampling, which this is moving towards[1]) doesn't say "no, not like that, give me another token". It excludes impossible tokens at each step, but otherwise samples like normal.

Is this a revolutionary trick? Not really, since llama.cpp and guidance, and probably others have already done it. But it's a good trick, and hopefully one of many to justify the valuation :).

[1]: https://github.com/normal-computing/outlines/pull/178