The limit is high quality data, not compute.

vicentwu · on Jan 27, 2025

RL doesn't need that much static data, it needs a lot of "good" tasks/challenges and computation.

grajaganDev · on Jan 27, 2025

Right and LLMs will not be able to generate their own high quality training data.

There are no perpetual motion machines.

throw310822 · on Jan 27, 2025

> LLMs will not be able to generate their own high quality training data.

Humans certainly did. We did not inherit our physics and poetry books from some aliens.

grajaganDev · on Jan 27, 2025

Humans and LLMs are different things.

LLMs can not reason - many people seen to believe that they can.

gatlin · on Jan 27, 2025

I can't prove that we did but I don't know that we /didn't/.

MRtecno98 · on Jan 27, 2025

LLMs are not humans, nowhere near.

sebastiennight · on Jan 28, 2025

Have you read about this specific model we're talking about?

My understanding is that the whole point of R1 is that it was surprisingly effective to train on synthetic data AND to reinforce on the output rather than the whole chain of thought. Which does not require so much human-curated data and is a big part of where the efficiency gain came from.

mike_hearn · on Jan 27, 2025

They already do. All the current leading edge models are heavily trained on synthetic data. It's called textbook learning.

belter · on Jan 27, 2025

> The limit is high quality data

If, as some companies claim, these models truly possess emergent reasoning, their ability to handle imperfect data should serve as a proof of that capability.