I've seen that, too. One of my clients redid their marketing site 3x in one year...

patio11 · on Feb 24, 2014

under the assumption that week-week changes in normalized stats would be swamped by my tests

This is an enormously problematic assumption, which you can verify by either looking at the week-to-week stats for a period prior to you joining the company, or (for a far more fun demonstration) doing historical testing of the brand of toothpaste you use for the next 6 weeks. Swap from Colgate to $NAME_ANOTHER_BRAND, note the improvement, conclude that website visitors pay an awful lot of attention to the webmaster's toothpaste choices.

ernopp · on Feb 24, 2014

Full disclosure: I work for Qubit who published this white paper.

This kind of "historical testing" (I think people often call it sequential testing?) can be pretty dangerous even for large effects. For example Christmas might be a really good time to change the colour of all the buttons on your site and see a 50% increase in sales.

lifeformed · on Feb 24, 2014

Yes. This kind of micro-A/B testing ("red or green buttons?") feels analogous to premature optimization when coding. Don't worry about the tiny 0.0001% improvements you get from using a for-loop over a while-loop; improve the algorithm itself for order-of-magnitude changes. Focus on the big picture.

lifeisstillgood · on Feb 24, 2014

Can you expand? does globally measured optimisations mean the whole site saw a 1% rise after we did x? why is that different to a/B testing?