Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can explain the high level concepts but it’s really difficult to say “this group of neurons does this specific thing and that’s why this output was produced”, though OpenAI did make some progress in getting GPT-4 to explain what each neuron in GPT-2 is correlated to but we can also find what human brain regions are correlated to but that doesn’t necessarily explain the system as a whole and how everything interacts


> but it’s really difficult to say “this group of neurons does this specific thing and that’s why this output was produced”,

That's because that's not how brains work.

> though OpenAI did make some progress in getting GPT-4 to explain what each neuron in GPT-2 is correlated to

The work contained novel-to-me, somewhat impressive accomplishments, but this presentation of it was pure hype. They could have done the same thing without GPT-4 involved at all (and, in fact, they basically did… then they plugged it into GPT-4 to get a less-accurate-but-Englishy output instead).


When I said about a group of neurons I was talking about LLMs, but some of the same ideas probably apply. Yes, it’s probably not as simple as that, and that’s why we can’t understand them.

I think they just used GPT-4 to help automate it on a large scale, which could be important to help understand the whole system especially for larger models


> I think they just used GPT-4 to help automate it on a large scale,

No, they used it as a crude description language for Solomonoff–Kolmogorov–Chaitin complexity analysis. They could have used a proper description language, and got more penetrable results – and it would've raised questions about the choice of description language, and perhaps have led to further research on the nature of conceptual embeddings. Instead, they used GPT-4 to make the description language "English" (but not really – since GPT-4 doesn't interpret it the same way as humans do), and it's unclear how much that has affected the results.

Here's the paper, if you want to read it again: https://openaipublic.blob.core.windows.net/neuron-explainer/... Some excellent ideas, but implemented ridiculously. It's a puff piece for GPT-4, the Universal Hammer. They claim that "language models" can do this explaining, but the paper only really shows that the authors can do explaining (which they're pretty good at, mind: it's still an entertaining read).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: