Hacker Newsnew | past | comments | ask | show | jobs | submit | more vicentwu's commentslogin

RL doesn't need that much static data, it needs a lot of "good" tasks/challenges and computation.


"Note on "tuned": OpenAI shared they trained the o3 we tested on 75% of the Public Training set. They have not shared more details. We have not yet tested the ARC-untrained model to understand how much of the performance is due to ARC-AGI data."

Really want to see the number of training pairs needed to achieve this socre. If it only takes a few pairs, say 100 pairs, I would say it is amazing!


75% of 400 is 300 :)


Trained with 300 raw pairs directly from the ARC training set without using any data augmentation process, such as generating many more pairs with some kind of ARC generator? That's amazing.


Wow are you AGI?


Off the topoc. I think, in the long-term , inference should be done along with some kind of training.


Past efforts leds to today's products. We need to wait to see the real imapct on the ability to ship.


It's amazing!


Fasciating. Language is a type of action evoloved for information exchaging, which maps latent "video", "audio" and "thoughts" into "sentences" and vice versa.


Cool! The real-time feedback will have enormous ramifications on the art creation workflows.


Exactly! Everything will change.

And this is without taking into account WebGPU and other advances in adjacent fields.


Asking models to do math is kind of an effecitve way to measure their capabilities, especially in reasoning and abstraction, which are quite important for problem solving.


You don't need to reason and abstraction to do basic calculation. ChatGPT will however happily give you some decent answers about not-too -hard math that requires reasoning. It just won't operate on digits.

Those are completely different ideas.


The chain-of-thought works quite like the "System 2" introduced in <<Thinking, Fast and Slow>>, which is slower, more deliberative, and more logical.


I think the last paragraph quite makes sense. It seems "true" that some kind of reasoning capability emerges as LLMs get bigger, which makes those LLMs quite useful and blows a lot of people's minds at the beginning. But, I think, essentially, the fundamental training goal of LLMs--guessing what the next word should be--pushes the model into a kind of reasonable nonsense generator, and the reasoning capability emerges because it can help the model to make stuff up. Therefore, we should be cautious about the result generated by these LLMs. They might be reasonable, but to make up the next word is their real top priority.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: