In my experience it can't be overstated how important it is to wait until you have a large sample size to decide whether a variation is the winner. Nearly all of the A/B tests I run start out looking like a variation is the clear, landslide winner (sometimes showing 100%+ improvement over the original) only to eventually end up regressing toward the mean. I can't get a clear idea of the winner of a test until I've shown the variation(s) to 10s of thousands of visitors and received a few thousand conversions. I've also learned that it's important to only perform tests on new visitors when possible. That means tests need to run longer to get the appropriate sample size. If you're testing over a few hundred conversions and performing tests on new and returning visitors then you're probably getting skewed results. Again, that's just in my experience so far. YMMV. One thing to consider with a test is that the variations may be too subtle to have a significant, positive impact on conversion.