Netflix Prize 10% barrier broken

aswanson · on June 26, 2009

...and I eat crow: http://news.ycombinator.com/item?id=393667

Haskell wins: http://news.ycombinator.com/item?id=394528

larryfreeman · on June 27, 2009

Pretty exciting. I've been working on the prize in my spare time. Got to #162 (.8838 rmse).

The algorithms involved and the solutions from past years have been very interesting. I highly recommend that people check out the forum posts on http://netflixprize.com

The winning submission was made by a combination of the top 3 teams. In my opinion, if they win it, it's well deserved! It will be very interesting to see Pragmatic Theory's method published.

Teams BellKor and Chaos Theory published their methods when they won the 2008 Progress Prize.

crocowhile · on June 27, 2009

I thought one of the caveat of the prize was that the winner was not supposed to disclose the algorithm they used, isn't?

scscsc · on June 27, 2009

On the contrary, they are supposed to publish it.

pageman · on June 30, 2009

would you have the URL for the methods?

SapphireSun · on June 27, 2009

Wow! I wonder though how they account for over fitting of the data. Is it a real solution or a statistical anomaly? I ask because it seems that the progress over the last year(s) has been small increments within the 9-10% range.

Eliezer · on June 27, 2009

I'm also worried about this. I believe that Netflix has a separate data set, not used in the previous reports of mean squared error, which validates the $ prize. I also believe that the teams have been use the Netflix-reported squared errors from standard test data to combine their estimates. If so, in a month we're going to learn that the Prize has not actually been won yet.

paraschopra · on June 27, 2009

But I wonder whether the effect of an unknown dataset slowly seeps into the algorithms as they are tweaked, especially when 100s of algorithms are competing to fit that data.

It might be akin to the fact that if you correlate more than 25-30 sets of 25-30 random numbers, you would find at least one statistifcally significant correlation.

bravura · on June 27, 2009

They are not granted access to the data set, so it would be statistically unlikely to achieve 10% RMSE improvement on the test set solely by chance.

davidalln · on June 26, 2009

And so it begins. It will be interesting to see what other teams will do now with such a tight deadline.

froo · on June 26, 2009

Congratulations to the team Bellkor's Pragmatic Chaos. Now if they could only spend some of their time researching how to make a page that doesn't look like it was coded by an 11 year old myspace user.

An example from the source

  <br>
  <br>
  &nbsp;&nbsp;&nbsp;&nbsp;<big><big><big><big>&nbsp;&nbsp;&nbsp;</big></big></big></big><big><big><big><big>&nbsp;&nbsp;</big></big></big></big><big><big><big><big>&nbsp;

  </big></big></big></big><br>
  &nbsp;&nbsp;&nbsp; <br>

redorb · on June 26, 2009

I dare say, if you can do what they are doing ~ your not to worried about perfect html or design.

smanek · on June 26, 2009

or grammar, apparently ;-)

("your not to" => "you're not too")

johnnybgoode · on June 27, 2009

I had exactly the same thought, word for word, when I read the comment, but I wasn't, um, inspired enough to post it and take the downvotes. ;)

froo · on June 26, 2009

No, you're right. I can't do what they're doing. However, my talents lie elsewhere so I submit to you that they can't do what I'm doing, so the point really is moot. It would be like complaining that a plumber can't cook as well as a chef.

That being said, HTML is a form of code (even if it is only markup) as it is a structured language and their solution is based off code no doubt, so there really is no excuse.