Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hamiltonian paths and previous work by Donald Knuth is more than likely in the training data.
 help



The specific sequence of tokens that comprise the Knuth's problem with an answer to it is not in the training data. A naive probability distribution based on counting token sequences that are present in the training data would assign 0 probability to it. The trained network represents extremely non-naive approach to estimating the ground-truth distribution (the distribution that corresponds to what a human brain might have produced).

Obviously there is some level of memorisation involved. That's why you can even get LLMs to write parts of Harry Potter from scratch with perfect precision.

>the distribution that corresponds to what a human brain might have produced..

But the human brain (or any other intelligent brain) does not work by generating probability distribution of the next word. Even beings that does not have a language can think and act intelligent.


You are always making predictions based on the context. That's why illusions can be so effective like these ones: https://illusionoftheyear.com/cat/top-10-finalists/2024/

LLMs also don't work by generating probability distributions of the next word. Your explanation isn't able to explain why they can generate words, let alone sentences.

That is exactly how they work.

No, a token is not a word.

I mean, it is some text.

How do you get from a piece of text smaller than a word to an entire coherent sentence?

[Citation needed] Neuroscience isn't yet at a point when it can say this with any certainty.

Anyway. It's not a theorem that you can be intelligent only if you fully imitate biological processes. Like flight can be achieved not only by the flapping wings.


>you can be intelligent only if you fully imitate biological processes

It is not that. It is about having an understanding of how it is trained. For example, if it was trained on ideas, instead of words, then it would be closer to intelligent behavior.

Someone will say that during training it builds ideas and concepts, but that is just a name that we give for the internal representation that results from training and is not actual ideas and concepts. When it learns about the word "car", it does not actually understand it as a concept, but just as a word and how it can relate to other words. This enables it to generate words that include "car" that are consistent, projecting an appearance of intelligence.

It is hard to propose a test for this, because it will become the next target for the AI companies to optimize for, and maybe the next model will pass it.


The latest models are mostly LMMs (large multimodal models). If a model builds an internal representation that integrates all the modalities we are dealing with (robotics even provides tactile inputs), it becomes harder and harder to imagine why those representations should be qualitatively different.

It can't, simply because the textual description of a concept is different from the concept itself.

Obviously, a concept (which is an abstraction in more ways than one) is different from a textual representation. But LLMs don't operate on the textual description of a concept when they are doing their thing. A textual description (which is associated with other modalities in the training data) serves as an input format. LLMs perform non-linear transformations of points in their latent space. These transformations and representations are useful not only for generating text but also for controlling robots, for example (see VLAs in robotics).

> don't operate on the textual description of a concept when they are doing their thing.

It could be mapping the text to some other internal representation with connections to mappings from some other text/tokens. But it does not stop text from being the ground truth. It has nothing else going on!

The "hallucination" behavior alone should be enough to reject any claims that these are at least minimally similar to animal intelligence.


> The "hallucination" behavior alone should be enough to reject any claims that these are at least minimally similar to animal intelligence.

Can you elaborate on why you think this is the case?


The internal representation happen to be useful not only for outputting text. What does it mean from your standpoint?

I didn't understand. Can you clarify?

If LLMs' internal representations are essentially one-to-one mappings of input texts with no additional structure, how can those representations be useful for tasks like object manipulation in robotics?

How is transfer learning possible when non-textual training data enhances performance on textual tasks?


I didn't mean it is a one to one mapping from tokens. But instead it might be mapping a corpus of input text to some points in some multi dimensional space, (just like the input data a linear regression), then then it just extends the line further across that space to get the output.

>How is transfer learning possible when non-textual training data enhances performance on textual tasks?

If non-textual training data can be mapped to the same multi-dimensional space ( by using them alongside textual data during training or something like that), then shouldn't it be possible to do what you describe?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: