lol what-I was looking at this list of people who do this-in fact a lot of them ...

superfx · on Feb 18, 2018

I do think however that protein folding is very much understudied in the ML community, relative to say the big three of vision, NLP, and speech. The lack of standardized data sets and benchmarks, not to mention the need for domain knowledge, have made it difficult to get into the field

mathperson · on Feb 26, 2018

at the risk of offending NLPers/Vision/Speech I just think those tasks are 'easier' in a variety of ways.

matt4077 · on Feb 18, 2018

CASP is a pretty nice dataset, so is all of the PDB.

cing · on Feb 19, 2018

The PDB represents the best we have, but I wouldn't call it a great dataset for learning. The 150,000 known structures are a drop in the ocean when it comes to the space of possible sequences/structures.