One of the authors here. There was no soft-coding in our adversary; it learned f...

One of the authors here. There was no soft-coding in our adversary; it learned from scratch (random initialization). The only "soft-coding" done (probably closer to hard-coding), which we sometimes refer to as hardening, was on the victim (not the adversary), in order to make sure the victim did not lose to the pass-vulnerability, in turn forcing the adversary to look for an arbitrary new vulnerability. In other words, this only made the problem harder for our adversary algorithm. This is discussed in more detail in section 4 of our paper under "Initialization."

Actually, at the time it was first trained, we were not aware of this pathology. We found out in later discussion that it was partially known in the computer go community. That is, cases where bots failed in cyclic ("loop") positions had been recorded. However, to our current knowledge, it was not known that it could be consistently targeted, whether by algorithm or by human, without very specific positions or specific sequences.