Congrats! I see by comparing the author list of v1 and V2 of this paper[1] that ...

SonOfLilit · on Feb 18, 2023

Context: v1 of the paper exploited a misconfiguration in how they configured the Go AI to play with an esoteric ruleset (the vulnerability that they found was documented in the KataGo docs, took me a few minutes to find it. KataGo still played much better Go than their adversarial program and no Go player would dispute it). My analysis at the time: https://twitter.com/AurSaraf/status/1588184384116514818

Now they fixed the misconfiguration so their training could find actual weaknesses, and they found a very big actual weakness, that even weak amateur Go players like me can look at and say "oh my God AI what are you doing defend your group!". And it even generalizes to another popular Go AI, which is great!

This is a great result, very impressive, great work by the team.

brilee · on Feb 18, 2023

Appendix H in the paper does a good job of explaining why this cyclic topology thing makes sense -

"David Wu’s hypothesis is that information propagates through the neural network in a way analagous to going around the cycle, but it cannot tell when it has reached a point it has "seen" before. This leads to it counting each liberty multiple times and judging such groups to be very safe regardless of the actual situation."

(David Wu is author of KataGo)

I'm a little bit more skeptical now; it seems like what happened was that David explained some known pathologies of Go AIs (based on analysis by Go AI experts), and one of those pathologies (cyclic topology) was soft-coded into an agent and then run through the adversarial training process to harden it, producing the result in the paper. The arrow of causality runs backwards here - I would be far more impressed if the adversarial training process could help elucidate failure modes of AIs. This looks more like expert analysis laundered through adversarial ML woo...

kellinpelrine · on Feb 18, 2023

One of the authors here. There was no soft-coding in our adversary; it learned from scratch (random initialization). The only "soft-coding" done (probably closer to hard-coding), which we sometimes refer to as hardening, was on the victim (not the adversary), in order to make sure the victim did not lose to the pass-vulnerability, in turn forcing the adversary to look for an arbitrary new vulnerability. In other words, this only made the problem harder for our adversary algorithm. This is discussed in more detail in section 4 of our paper under "Initialization."

Actually, at the time it was first trained, we were not aware of this pathology. We found out in later discussion that it was partially known in the computer go community. That is, cases where bots failed in cyclic ("loop") positions had been recorded. However, to our current knowledge, it was not known that it could be consistently targeted, whether by algorithm or by human, without very specific positions or specific sequences.

brilee · on Feb 18, 2023

Oh wow, that's amazing. Thanks for the clarification. Glad you got involved with the paper!

kuboble · on Feb 18, 2023

As a go player I fully agree. I have criticized the v1 exploit few months ago as irrevant rule glitch exploit, but this is indeed relevant and beautiful real go understanding exploitation.