The "regression to the mean" and "novelty" effect is getting at two different things (both true, both important).
1. Underpowered tests are likely to exaggerate differences, since E(abs(truth - result)) increases as the sample size shrinks.
2. The much bigger problem I've seen a lot: when users see a new layout they aren't accustomed to they often respond better, but when they get used to it, they can begin responding worse than with the old design. Two ways to deal with this are long term testing (let people get used to it) and testing on new users. Or, embrace the novelty effect and just keep changing shit up to keep users guessing - this seems to be FB's solution.
1. Underpowered tests are likely to exaggerate differences, since E(abs(truth - result)) increases as the sample size shrinks.
2. The much bigger problem I've seen a lot: when users see a new layout they aren't accustomed to they often respond better, but when they get used to it, they can begin responding worse than with the old design. Two ways to deal with this are long term testing (let people get used to it) and testing on new users. Or, embrace the novelty effect and just keep changing shit up to keep users guessing - this seems to be FB's solution.