But they don't mention Alpha Go Zero, which is on another level from Alpha Go be...

espadrine · on Feb 18, 2023

While it is possible that two trainings yield different weaknesses, the original paper is precise enough (and the authors were active enough in message boards) that it could be reproduced. Some tweaks were added as things went along. However, the fact that KataGo and ELF OpenGo, which have different tweaks but are also based on AlphaZero, are also vulnerable implies that it is very likely that AlphaGoZero and AlphaZero share the weakness.

The fundamental issue is that they all do self-play, and those plays follow the same statistical distribution as a result. Weird games happen rarely, while the adversarial network forces a weird game every time.

On the other hand, I am not sure whether MuZero would be vulnerable. I would guess so, but its dynamics system may learn something extra about the state of the board.

Either way, it seems clear that future self-play training should include adversarial models, instead of pure self-play.

(They also all share another weakness, including MuZero, which is that their MCTS algorithm doesn’t allow the neural network to gain board information from their evaluation of possible futures; only win rate. Meanwhile a human sometimes realizes, while navigating a possible sequence of moves, that one particular early move could turn the tide.)

kuboble · on Feb 18, 2023

My understanding is the weakness comes from the fact that convolutional nn are used and how the information propagates through layers. It's been known from the beginning that this kind of position (large cyclic hroups) is a weak spot of all bots based on those cnn. What was not clear at all was whether it's possible to force those positions.

espadrine · on Feb 18, 2023

Definitely true! Despite that, it is probably possible to make it detect this pattern, but only if it appears during training. The thing is that this becomes a vicious circle: the network fails to learn this because its architecture is not conducive to that, so it doesn't play it right with the limited self-play visit count, so the right line doesn't appear in the self-play, so it fails to learn it.

KataGo adds a few situations in training which didn't appear in self-play, "blind spots" that a shallow search depth fails to notice as good moves, so that it still tries to learn from positions it would not generate.

It is a little bit like learning to tie your shoelaces. Brains are not great at guessing the qualities of knots among all possible ties, but if you see it done once, you're set for life.