Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's an illusion. For any thing it "knows" you can persuade it to claim exactly opposite thing. It just randomly landed on correct thing first just because it seen it more often in the input data. Despite appearing 100% confident about everything it has actual 0% of confidence about anything it says. Although it insists on some things bit longer than on others.


> For any thing it "knows" you can persuade it to claim exactly opposite thing.

Which is actually a novel capability and arises because the network does reinforcement learning over its own context window. It's a strength, not a weakness. Humans can do the same thing. ("Assume that X...")

> It just randomly landed on correct thing first just because it seen it more often in the input data.

Isn't that just a description of learning?

It's true that the network has no idea what is "true". But it's not like we do either, all we do is learning from correlations. We're just better at it.


Well, you are right that it may learn statistical associations that we associate/generalize ourselves with/as a model when we probe it.

I think the key bit here is that we can influence its associations by inputting the prompt, changing the model that is presented.

I'm waay ahead of myself here, but the thought is interesting, and it will likely remain an open question for at least a few months/years.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: