Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a well-written article, and the concepts are explained clearly, thanks for sharing. I'd just like to add a caveat.

When the author says "if a researcher runs 400 experiments on the same train-test splits" - then depending on what he means by 'test' set, that researcher is wrong. In pretty much all machine learning literature I've come across, it's drilled into you that you never look at your held-out test set until the very end. Hyperparameter optimisation and/or model selection happens on the training set and only when you've tuned your hyperparameters and selected your best model do you run the model on your test set to see how it's done.

Once you've run the model on the test set once, you can't go back to tweak your model because you're introducing bias and you no longer have any data left that your model's never seen before.

To avoid overfitting, you can use cross-validation to effectively re-use your training set and create multiple training/validation splits. (As an aside, I find it frustrating how liberally different sources switch between 'validation set' and 'test set', it's really confusing).



OP here: I probably should have made it clearer that the using the same train-test splits is verboten.

Your second and third paragraphs are also exactly correct. I attempted to make those points in the post, but you've done so more effectively here.


Yeah I just thought it was worth hammering it home, especially because of some literature's use of "test set" to mean "validation set" which can really throw off a beginner like myself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: