There is a large gap between true statistical significance and a fair coin toss....

There is a large gap between true statistical significance and a fair coin toss. Bad testing (of whatever flavor A/B, MAB, etc.) is likely to land somewhere in that gap. Most likely worse off than proper testing but also quite likely better than tossing the coin or throwing darts.