Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, but that kills the implied hope of building a better model for cheaper. Like this you'll always have a ceiling of being a bit worse then the openai models.


The logic doesn't exactly hold, it is like saying that a student is limited by their teachers. It is certainly possible that a bad teacher will hold the student back, but ultimately a student can lag or improve on the teacher without only a little extra stimulus.

They probably would need some other source of truth than an existing model, but it isn't clear how much additional data is needed.


Isn't DeepSeek a bit better, not worse?


Don't forget that this model probably has far less params than o1 or even 4o. This is a compression/distillation, which means it frees up so much compute resources to build models much powerful than o1. At least this allows further scaling compute-wise (if not in the amount of, non-synthetic, source material available for training).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: