I agree, formatting details seem to be very context-specific and this is a lab setup, not in the wild. I would trust much more some AB test results from the NYT or any media organization - line length is one of the most obvious things to test!
I tested page-widths (which forces line length) on my own site, using n=109k visitors (so, quite a bit larger than n=20): http://www.gwern.net/AB%20testing#max-width-redux A wide - but not the widest - version performed best.
I tested page-widths (which forces line length) on my own site, using n=109k visitors (so, quite a bit larger than n=20): http://www.gwern.net/AB%20testing#max-width-redux A wide - but not the widest - version performed best.