Yeah, but that kills the implied hope of building a better model for cheaper. Li...

roenxi · on Jan 30, 2025

The logic doesn't exactly hold, it is like saying that a student is limited by their teachers. It is certainly possible that a bad teacher will hold the student back, but ultimately a student can lag or improve on the teacher without only a little extra stimulus.

They probably would need some other source of truth than an existing model, but it isn't clear how much additional data is needed.

reassess_blind · on Jan 29, 2025

Isn't DeepSeek a bit better, not worse?

diedyesterday · on Jan 30, 2025

Don't forget that this model probably has far less params than o1 or even 4o. This is a compression/distillation, which means it frees up so much compute resources to build models much powerful than o1. At least this allows further scaling compute-wise (if not in the amount of, non-synthetic, source material available for training).