But they don't mention Alpha Go Zero, which is on another level from Alpha Go because it's not held back by human game sample data.
Now the article mentions Leela Zero
and that one is an open source version of Alpha Zero where they used a community effort to do the same as Alpha Go zero described in the paper.
I see that this took multiple years but was declared finished in 2021.
Status page says it played games in training are 21 million while Wikipedia says that for alpha go zero it was 5 million.
So you'd think it's better but actually it might just mean that they are not using they same techniques after all and so maybe worse.
So question for me still is whether this technique would actually be applicable to Alpha Go Zero.
But for me, this article is more a win for AI, because it means that they have found better techniques to find blind spots in models which can from now on be used in training to eventually get rid if those.
While it is possible that two trainings yield different weaknesses, the original paper is precise enough (and the authors were active enough in message boards) that it could be reproduced. Some tweaks were added as things went along. However, the fact that KataGo and ELF OpenGo, which have different tweaks but are also based on AlphaZero, are also vulnerable implies that it is very likely that AlphaGoZero and AlphaZero share the weakness.
The fundamental issue is that they all do self-play, and those plays follow the same statistical distribution as a result. Weird games happen rarely, while the adversarial network forces a weird game every time.
On the other hand, I am not sure whether MuZero would be vulnerable. I would guess so, but its dynamics system may learn something extra about the state of the board.
Either way, it seems clear that future self-play training should include adversarial models, instead of pure self-play.
(They also all share another weakness, including MuZero, which is that their MCTS algorithm doesn’t allow the neural network to gain board information from their evaluation of possible futures; only win rate. Meanwhile a human sometimes realizes, while navigating a possible sequence of moves, that one particular early move could turn the tide.)
My understanding is the weakness comes from the fact that convolutional nn are used and how the information propagates through layers.
It's been known from the beginning that this kind of position (large cyclic hroups) is a weak spot of all bots based on those cnn.
What was not clear at all was whether it's possible to force those positions.
Definitely true! Despite that, it is probably possible to make it detect this pattern, but only if it appears during training. The thing is that this becomes a vicious circle: the network fails to learn this because its architecture is not conducive to that, so it doesn't play it right with the limited self-play visit count, so the right line doesn't appear in the self-play, so it fails to learn it.
KataGo adds a few situations in training which didn't appear in self-play, "blind spots" that a shallow search depth fails to notice as good moves, so that it still tries to learn from positions it would not generate.
It is a little bit like learning to tie your shoelaces. Brains are not great at guessing the qualities of knots among all possible ties, but if you see it done once, you're set for life.
Now the article mentions Leela Zero and that one is an open source version of Alpha Zero where they used a community effort to do the same as Alpha Go zero described in the paper.
From the status page here: https://zero.sjeng.org/
I see that this took multiple years but was declared finished in 2021. Status page says it played games in training are 21 million while Wikipedia says that for alpha go zero it was 5 million.
So you'd think it's better but actually it might just mean that they are not using they same techniques after all and so maybe worse.
So question for me still is whether this technique would actually be applicable to Alpha Go Zero.
But for me, this article is more a win for AI, because it means that they have found better techniques to find blind spots in models which can from now on be used in training to eventually get rid if those.