I do think however that protein folding is very much understudied in the ML community, relative to say the big three of vision, NLP, and speech. The lack of standardized data sets and benchmarks, not to mention the need for domain knowledge, have made it difficult to get into the field
The PDB represents the best we have, but I wouldn't call it a great dataset for learning. The 150,000 known structures are a drop in the ocean when it comes to the space of possible sequences/structures.