Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fundamental nature of the model is that it consumes tokens as input and produces token probabilities as output, but there's nothing inherently "predictive" about it -- that's just perspective hangover from the historical development of how LLMs were trained. It is, fundamentally, I think, a general-purpose thinking machine, operating over the inputs and outputs of tokens.

(With this perspective, I can feel my own brain subtly oferring up a panoply of possible responses in a similar way. I can even turn up the temperature on my own brain, making it more likely to decide to say the less-obvious words in response, by having a drink or two.)

(Similarly, mimicry is in humans too a very good learning technique to get started -- kids learning to speak are little parrots, artists just starting out will often copy existing works, etc. Before going on to develop further into their own style.)

 help



Non-sequitor: "perspective hangover" might be my favorite phrase I've ever read. So much of what we deal with is trying to correct-the-record on how we used to think about things. But the inertia that old ideas or modes have is monumental to overcome. If you just came up with that, kudos.

Ha, thanks!

We could argue about whether fine tuning is still about predicting a distribution or not, but really I feel like whether or not that word is accurate misses the point of why the description is useful.

I like the phrasing because it distinguishes it from other things the generative model might be doing including:

- Creating and then refining the whole response simultaneously, like diffusion models do.

- Having hidden state, where it first forms an "opinion" and then outputs it e.g. seq2seq models. Previously output output tokens are treated differently from input tokens at an architectural level.

- Having a hierarchical structure where you first decide what you're going to say, and then how you're going to say it, like wikipedia's hilarious description of how "sophisticated" natural language generation systems work (someone should really update this page): https://en.wikipedia.org/w/index.php?title=Natural_language_...


Welllll I'm not so sure that phrase is well-suited for your intended meaning, then. (Also, tangentially, I think could argue thinking models w/ the elided thought prelude satisfy "having hidden state where it first forms an opinion.")



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: