This is a powerful approach when you can quantify your regret. For many startups...

This is a powerful approach when you can quantify your regret. For many startups, however, it's important to understand the tradeoffs involved in moving one metric upward or downward. To take Zynga as an example, they care about virality at least as much as engagement (or perhaps moreso). Adding or removing a friendspam dialog is likely to trade some virality for user experience. What percentages make or break the decision? Sometimes this is a qualitative call.

In environments where you need to look at the impact of your experiments across multiple variables, and make a subjective call about the tradeoffs, it's really important to have statistical confidence in the movement of each variable you're evaluating. This is a key strength of the traditional A/B testing approach.