Humans take in a tremendously high bitrate of data via other senses and are able...

alasdair_ · on Oct 7, 2022

This is a fantastically good point. I think things will get even more interesting once the ML tools have access to more than just text, audio and image/video information. They will be able to draw inferences that humans will generally be unaware of. For example, maybe something happens in the infrared range that humans are generally oblivious to, or maybe inferences can be drawn based on how radio waves bounce around an object.

"The universe" according to most human experience misses SO much information and it will be interesting to see what happens once we have agents that can use all this extra stuff in realtime and "see" things we cannot.

didericis · on Oct 8, 2022

As far as I know, all sensory evolution prior to this point has been motivated based on incremental gains in fitting a changing environment.

True vision requires motive and embodied self. I’m ignorant about the state of the art here, but I’m way more terrified of what these things don’t see than interested in what they could show us. It seems to me that the only human motives accessible to machines are extremely superficial and behavioral based.

Knowledge is not some disconnected map of symbols that results in easily measurable behavior, it has a deep and fundamental relation to conscious and unconscious human motivation.

I don’t see any possible way to give a machine that same set of motives without having it go through our same evolutionary and cultural history, and strongly believe most of our true motives are under many protective layers of behavioral feints and tests and require voluntary connection and deep introspection to fractionally expose to our conscious selves, let alone others, let alone a computer.

These models seem to be amazingly good at combining maps of already travelled territory. Trying to use them to create maps for territory that is new to us seems incredibly dangerous.

Am I missing something here, or is it not true that AI models operate purely on bias? What we chose to measure and train the model on seems to predetermine the outcome, it’s not actually empirical because it can’t evaluate whether it’s predictions make sense outside of that model. At some point it’s always dependent on a human saying “success/fail”, and seems more like an incredibly complicated kaleidoscope. Maybe they can cause humans to see patterns we didn’t see before, but I don’t think it’s something that could actually make new discoveries on its own.

Vetch · on Oct 8, 2022

I think your point is more interesting but the problem is tabula-rasa knowledge starts. A human isn't born knowing about quantum mechanics, christoffel symbols or what pushforward measures are. If there was just a method to learn facts from scratch as cheaply as brilliant humans do, it would be so amazing. Even if you count from elementary school years, humans still end up with less energy spend by several orders of magnitude.

Transformers themselves are a lot more effective compared to n-gram models or non-contextual word vectors. I imagine there is something to Transformers as Transformers are to word2vec.

cma · on Oct 7, 2022

Google's Imagen was trained on about as many images as a 6 year old would have seen over their lifetime at 24fps and a whole lot more text. It can draw a lot better and probably has a better visual vocabulary but is also way outclassed in many ways.

Paucity of the stimulus is a real problem and may mean our starting point architecture from genetics has a lot of learning built in than just a bunch of uninitialized weights randomly connected. A newborn animal can often get up and walk right away in many species.

https://www.youtube.com/watch?v=oTNA8vFUMEc

Humans have a giant head at birth and muscles too weak, but can swim around like little seals pretty quickly after birth.

gamegoblin · on Oct 7, 2022

Definitely. I do think video is much more important than images, because video implicitly encodes physics, which is a huge deal.

And, as you say, there are probably some structural/architectural improvements to be made in the neural network as well. The mammalian brain has had a few hundred million years to evolve such a structure.

It also remains unclear how important learning causal influence is. These networks are essentially "locked in" from inception. They can only take the world in. Whereas animals actively probe and influence their world to learn causality.

akiselev · on Oct 7, 2022

The mammalian brain have had a few hundred million years to evolve neural plasticity [1] which is the key function missing in AI. The brain’s structure isn’t set in stone but develops over one’s lifetime and can even carry out major restructuring on a short time scale in some cases of massive brain damage.

Neural plasticity is the algorithm running on top of our neural networks that optimizes their structure as we learn so not only do we get more data, but our brains get better tailored to handle that kind of data. This process continues from birth to death and physical experimentation in youth is a key part of that development, as is social experimentation in social animals.

I think “it remains unclear” only to the ML field, from the perspective of neuroscientists, current neural networks aren’t even superficially at the complexity of axon-dendrite connections with ion channels and threshold potentials, let alone the whole system.

A family member’s doctoral thesis was on the potentiation of signals and based on my understanding if it, every neuron takes part in the process with its own “memory” of sorts and the potentiation she studied was just one tiny piece of the neural plasticity story. We’d need to turn every component in the hidden layers of a neural network into it’s own massive NN with its own memory to even begin to approach that kind of complexity.

[1] https://en.m.wikipedia.org/wiki/Neuroplasticity

visarga · on Oct 7, 2022

> our starting point architecture from genetics has a lot of learning built in

I don't doubt that evolution provided us with great priors to help us be fast learners, but there are two more things to consider.

One is scale - the brain is still 10,000x more complex than large language models. We know that smaller models need more training data, thus our brain being many orders of magnitude larger than GPT-3 naturally learns faster.

The second is social embedding - we are not isolated, our environment is made of human beings, similarly an AI would need to be trained as part of human society, or even as part of an AI society, but not alone.

mr_toad · on Oct 8, 2022

> Google's Imagen was trained on about as many images as a 6 year old would have seen over their lifetime at 24fps

The six year old has the advantage of being immersed in a persistent world where images have continuity and don’t jump around randomly. For example infants learn very quickly that most objects stay put even when they aren’t being observed. In contrast a dataset of images on the internet doesn’t really demonstrate how the world works.

pgcj_poster · on Oct 8, 2022

> It can draw a lot better

Drawing involves taking a mental image and converting it into a sequence of actions that replicate the image on a physical surface. Imagen does not do that. I think the images it generates are more analogous to the image a person creates in their mind before drawing something.

cma · on Oct 8, 2022

I was too loose with that. There is CLIPDraw and others that operate at the stroke/action level but haven't been trained on as much data. Still impressive at the time:

https://www.louisbouchard.ai/clipdraw/

saynay · on Oct 7, 2022

One of the more interesting things I have seen recently is the combination of different domains in models / datasets. The top network of Stable Diffusion combines text-based descriptions with image-based descriptions, where the model learns to represent either text or images in the same embedding; a picture, or a caption for that picture, lead to similar embeddings.

Effectively, this can broaden the context the network can learn. There are relationships that are readily apparent to something that learned images that might not be apparent to something trained only on text, or vis-versa.

It will be interesting to see where that goes. Will it be possible to make a singular multi-domain encoder, that can take a wide range of inputs and create an embedding (an "mental model" of the input), and have this one model be usable as the input for a wide variety of tasks? Can something trained on multi-domains learn new concepts faster than a network that is single-domain?

Teever · on Oct 7, 2022

I would love to see a model trained on blueprints or a model trained on circuit diagrams.

text2blueprint or wav2schematic could produce some interesting things.

Jensson · on Oct 7, 2022

They haven't even figured out basic math, so not sure what you would expect to find there. They aren't smart enough to generate structure that doesn't already exist.

visarga · on Oct 7, 2022

Depends on the method. Evolutionary methods can absolutely find structure that we missed, and they often go hand in hand with learning. Like AlphaGo move 37.

Jensson · on Oct 8, 2022

AlphaGo had a lot of driver code involved to make it tick, it wasn't just a big network deciding what to do. You would need something similar here, without someone figuring out that driver code you aren't revolutionizing anything with todays neural networks.

gamegoblin · on Oct 8, 2022

AlphaGo without the search algorithm, just the neural net, operates at a professional (but not superhuman) level.

Later refinements with AlphaZero and MuZero also reduced the driver size.

The most naive form is just Monte Carlo tree search using the neural net as the bias/eval function. A basic MCTS is just 100 lines of code or so.

Jensson · on Oct 8, 2022

Yes, since Go is a very simple game. Making a proper driver for much more complex domains like engineering blueprints is not something we know how to do today.

Edit: Also you are missing the Go engine in that comment, it can't train without a Go engine to train against that evaluates the results of each move. That Go engine is a part of the training algorithm and thus is also a part of the driver code, you would need to produce something similar to train a similar AI for other domains. We don't know how to write similar blueprint engines or text evaluation engines, so we can't expect such AI models to produce similar results.

simonh · on Oct 8, 2022

GPT3 can do some basic arithmetic.

Jensson · on Oct 8, 2022

It makes errors doing basic arithmetic's, so it just memorized some symbolic transitions and didn't figure out the underlying math.

simonh · on Oct 10, 2022

Agreed, it's just rote learning.

visarga · on Oct 7, 2022

The hypothesis that you can't learn some things from text - you need real life experience, is intuitive and I used to think it's true. But there are interesting results from just a few days ago saying that text by itself is also enough:

> We test a stronger hypothesis: that the conceptual representations learned by text only models are functionally equivalent (up to a linear transformation) to those learned by models trained on vision tasks. Specifically, we show that the image representations from vision models can be transferred as continuous prompts to frozen LMs by training only a single linear projection.

Linearly Mapping from Image to Text Space - https://arxiv.org/abs/2209.15162

ummonk · on Oct 7, 2022

The claim isn’t that you can’t learn it from text, but rather that this is why models require so much text to train on - because they’re learning the stuff that humans learn from video.

Vetch · on Oct 8, 2022

The key issue is learning effort (such as energy vs time). Congenitally deaf-blind humans with no accompanying mental disabilities as a shared cause can learn as children just fine without any video or sound from comparatively low bandwidth channels like proprioception and touch.

Another issue is what we really care about is scientific reasoning and there, if anything, nature has given an anti-bias, at least at the level of interfacing with facts. People aren't born biased towards learning Metric Tensors and Christoffell Symbols but it takes only a few years at a handful of hours a day using a small number of joules for many humans to get it (I'm counting from all grade school prerequisites vs GPUs watts x time). Much fewer for genius children.

VirusNewbie · on Oct 8, 2022

Great comment. Something I hadn’t considered. So a question might be, how much data is touch and spatial reasoning?

ahefner · on Oct 8, 2022

And how many Joules of energy are consumed by each of them?

repsilat · on Oct 7, 2022

They're also learning all the things we learned through evolution (a rather data-inefficient process.)

People's brains are wired for language, it's natural we have a head-start.

kelseyfrog · on Oct 8, 2022

Im testing this argument out, but doesnt this apply to all tasks, not just language? I can learn to paint from scratch in what like 300 attempts? 1000 attempts? It takes far more examples to train a guided diffusion model. I'd struggle to believe that our brains are hardwired for painting

jonahx · on Oct 8, 2022

Could be something like:

1. hardwired for creating visual models of 3d / 2d space

2. hardwired for fine-grained hand motor movements.

So the learning is just combining those two skills into the ability to make the brush strokes needed to output the model onto paper.

thrown_22 · on Oct 7, 2022

> Humans take in a tremendously high bitrate of data via other senses and are able to connect those to the much lower amount of language input such that the language can go much further.

They don't. Human bitrates are quite low all things considered. The eyes which by far produce them most information only have a bitrate equivalent to ~2kbps:

http://www.princeton.edu/~wbialek/our_papers/ruyter+laughlin...

The rest of the input nerves don't bring us over 20kpbs.

The average image recognition system has access to more data and can tell the difference between a cat and a banana. A human has somewhat more capability than that.

lostmsu · on Oct 10, 2022

I think the link says a single synapse does 2kbps, not the whole visual cortex. There are 6 quadrillion (10^12) synapses (3q per hemisphere) in visual cortex according to https://pubmed.ncbi.nlm.nih.gov/7244322/

If we play "the bus filled with ping-pongs" with that information: it is a 3D structure so if you assume cortex is a perfect cube that feeds to something right behind it, you will get (10^12)^(2 (dimensions)/3 (of 3 dimensions)) channels, e.g. 10^8 channels 2kbps each. E.g. about 25GB/s. Which is less than an order of magnitude off from an estimate you would get from 8000x8000 resolution per eye True Color at 24fps - 9GB/s.

hombre_fatal · on Oct 8, 2022

What an interesting way to put it.