> Imagine a process where the only criteria are technical soundness and novelty,...

PartiallyTyped · on May 27, 2021

As a fellow ML researcher, I want to add that the lack of code along with the publication makes the problem worse.$BIGGROUP gets a paper whose core contribution is a library, published, and yet they haven't released the code 6 months after the conference, effectively claiming credits for something unverifiable.

Al-Khwarizmi · on May 27, 2021

I guess this can be different depending on your specific field, but in NLP it really changed for the better in the last few years.

I don't have data, but from subjective experience, 5-6 years ago most papers in major NLP conferences didn't have an associated code repository. Now, the overwhelming majority do.

There are still many other problems, for example a big one is reporting of spurious improvements that can vanish if you get a less lucky random seed. But at least including code is now common practice.

Delk · on May 27, 2021

Back when I did a stint at something NLP-ish for my master's, one of the problems seemed to be that, apart from lack of code, the data was also often non-public and specific to the study. That made it impossible to compare different algorithms even as far as the results reported in the publications themselves go because the testing methodology was all over the place and the datasets used for testing various algorithms might have been all different. You couldn't really make much out of the reported results even if you believed the authors reported honestly and had their methodology more or less straight.

I suppose the situation regarding common datasets might vary between subfields and NLP tasks, so maybe I just saw a weird corner of it.

Of course the code was also nowhere to be seen.

Availability of code would of course be even more important, both because of replicability and general verifiability, and also because that would allow you to do a comparison with any number of datasets yourself.

Glad to hear that code availability has been improving.

> There are still many other problems, for example a big one is reporting of spurious improvements that can vanish if you get a less lucky random seed.

Considering that a lot of NLP is at least somewhat based on machine learning, don't people do cross-validation or something?

wodenokoto · on May 28, 2021

For NLP, sharing data is a bit of a problem though.

You do a paper showing that problem X can be solved slightly better by downloading and training on a billion tweets.

But you don’t have the copyright to those tweets, so you can’t share data.

> don't people do cross-validation or something

A lot of stable problems comes with a dataset already split into train and test.

Delk · on May 29, 2021

> You do a paper showing that problem X can be solved slightly better by downloading and training on a billion tweets.

That's true. Sometimes you might try to tweak the algorithm itself rather than the data, though, or experiment with different kinds of preprocessing or something, and in those cases it would be helpful to be able to do different experiments with shared datasets.

My limited experiences were from around the time deep learning was only about to become a big thing, so it might have been different then. Maybe you nowadays just throw more tweets and GPUs at the problem.