Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The limit is high quality data, not compute.


RL doesn't need that much static data, it needs a lot of "good" tasks/challenges and computation.


Right and LLMs will not be able to generate their own high quality training data.

There are no perpetual motion machines.


> LLMs will not be able to generate their own high quality training data.

Humans certainly did. We did not inherit our physics and poetry books from some aliens.


Humans and LLMs are different things.

LLMs can not reason - many people seen to believe that they can.


I can't prove that we did but I don't know that we /didn't/.


LLMs are not humans, nowhere near.


Have you read about this specific model we're talking about?

My understanding is that the whole point of R1 is that it was surprisingly effective to train on synthetic data AND to reinforce on the output rather than the whole chain of thought. Which does not require so much human-curated data and is a big part of where the efficiency gain came from.


They already do. All the current leading edge models are heavily trained on synthetic data. It's called textbook learning.


> The limit is high quality data

If, as some companies claim, these models truly possess emergent reasoning, their ability to handle imperfect data should serve as a proof of that capability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: