this is hilarious and also a great idea. i dont see any reason why you can't play a few million games against itself and other engines and see where it takes you. less efficient than alpha zero probably, but how much so?
It should be similarly efficient. AlphaZero used 1,000 TPUv1's to generate self-play games, and a much smaller number of TPUs to train the model on the previous self-play results. Whenever it generated a model that was >= 55% better, that became the new model.
You're right, "efficient" should be substituted with "possible". We're certainly not claiming that this is a smart way to do it, just that you can.
Still, I think that there's a chance it could work well. Each move could be prefixed with the final outcome of the game, which is the technique either alphazero or muzero uses.