There exists a theory of a single general-purpose learning algorithm which could explain the principles of its operation. This theory assumes that the brain has some initial rough architecture, a small library of simple innate circuits which are prewired at birth and proposes that all significant mental algorithms can be learned. Given current understanding and observations, this paper reviews and lists the ingredients of such an algorithm from both architectural and functional perspectives.
From reading the introduction, it sounds like the author is covering similar ground as the book The Master Algorithm[1] by Pedro Domingos[2]. If you find this interesting, you may find his book interesting as well.
AIXI has many known problems. The interesting thing is that it is kind of like a mathematically pure and elegant model of machine learning based AI. So any problems with AIXI might apply to our current approaches to AI.
The biggest issue is that AIXI doesn't believe it exists in the universe it is observing. It thinks it's playing some kind of video game. From this it doesn't believe that it can truly die, it doesn't believe it can affect it's own brain in any way, and it doesn't care about anything other than maximising some "score" that the "game" provides it.
> it doesn't believe it can affect it's own brain in any way
I think this is the biggest problem with AIXI. Whilst we can argue whether an existing AIXI would have properties X, Y, Z, that's kind of irrelevant because AIXI can't exist (it's incomputable).
The real point of AIXI is to help researchers by defining something to use in models or to compare their work against. Unfortunately, as you say, its algorithm is fixed. What we'd really like is for AIXI to be the fixed-point or limit of some iterative/recursive process; then, even though a true AIXI is unrealisable, we could still kick off the process and push it as far as we can.
In fact, AIXI is actually the limiting case of AIXI(t, l), which is an AIXI-like algorithm whose Turing machine models halt after t steps or using l tape cells (whichever comes first). AIXI is the limit as t and l approach infinity. This isn't particularly helpful though, as it just tells AI researchers that making bigger, faster computers will help, which we already knew.
In that sense, Schmidhuber's Goedel Machine architecture is potentially more fruitful. Last I saw, there was no real implementation yet ( e.g. http://people.idsia.ch/~juergen/selfreflection.pdf ). The Powerplay architecture is more easily realisable, as it optimises greedily rather than optimally, and uses regression tests rather than a verification.
In fact, we can also make another architecture in between these two, which optimises greedily but uses verification instead of regression tests. I've implemented this in Coq at http://chriswarbo.net/essays/powerplay
Also, AIXI has only one layer of hierarchy to its model, and has no latent parameters for its causal structures (Turing machines), nor any allowance for compactness or uncountability of learned spaces.
It's a really incredibly bare-bones, skeletal model of a learning agent that makes very rigid, unrealistic modeling assumptions and then adds on computational infeasibility.
I don't know what you mean by hierarchy, latent parameters, or compactness. It models with Turing machine which are very general and can simulate other Turing machine that are unboundedly huge.
The biggest issue is that an agent can't model the universe as merely a set of inputs, outputs, and score. But there may not be a better way, it's quite a difficult problem.
Yes I'm aware, I just don't see how they are relevant to AIXI, an AI with infinite computing power that can model anything with a universal Turing machine.
Statistics jargon will sadly not make AIXI work. It's problem is very deep.
>Statistics jargon will sadly not make AIXI work. It's problem is very deep.
It's not that deep. It's, at worst, dealing with the Curse of Dimensionality because it has no notion of randomness: when the inputs it receives are noisy, the prior probability of Turing machines with the noise bits pre-encoded will drop exponentially with the length of the noise. Hence why compactness and other assumptions about the real-line tend to help in real-world statistics: they make it easy to notice noise and operate with imperfect precision.
Philosophy jargon about "self-awareness" isn't necessarily going to yield more insights than it always has (ie: Chinese Room, ie: very little).
Any realistic approximation of AIXI would use probabilistic Turing machines. That either output probabilities instead of exact predictions, or perhaps use probabilities internally as well.
However for true AIXI with infinite computing power, that doesn't really matter. Randomness can be represented as a stored random seed data, and isn't treated any different than any other unknown variables.
I never used philosophy jargon or even the word "self-awareness". I stated my issues with AIXI in plain english and explained them. It has literally nothing to do with the Chinese Room.
>Any realistic approximation of AIXI would use probabilistic Turing machines.
That would be a fresh model of AI, rather than an AIXI approximation. You should probably look into that idea.
>However for true AIXI with infinite computing power, that doesn't really matter. Randomness can be represented as a stored random seed data, and isn't treated any different than any other unknown variables.
Tsk tsk. It's a curse-of-dimensionality issue. If we have N bits of optimally-compressed random seed plus M bits of structure (and yeah, AIT has ways to separate a string X into its structure and random bits iff you have the normal Halting Oracle), then the prior probability of that particular machine is 2^{-(N+M)}. The noisier the input dataset, the larger N grows. In normal learning, we want M to be constant (which we can usually assume it is: the universe mostly doesn't acquire new causal structure while we're looking at it), which then allows the posterior probability of good hypotheses to rise logarithmically with sample size. If each sample contains noise, then we actually have to split things up:
2^-M for the causal structure, where M is constant and the posterior thus gains information logarithmically. 2^-N for the random seed, where the ground-truth random seed actually grows linearly in length with each sample we observe (because of the bits of entropy Nature used-up to make that sample); the prior probability of each random-seed drops logarithmically as the number of samples grows we anticipate seeing grows.
So while this is all very informal, I'd have to say that a noisy Solomonoff induction actually suffers from a Curse of Dimensionality because it assumes everything is discrete, while more typical machine-learning models based on continuous distributions can learn well in the face of noise.
Do people generally read these things before upvoting? Legitimately curious.
It lists some possible goals to achieve more general human-like intelligence beyond the fancy function approximation we get with deep supervised learning, as stated in the abstract for both architectural and functional perspectives. In general I find the language fairly wishi-washy and the writing often awkward, but it is a nice summary of relevant thoughts and concepts. Beyond the abstract, here is a bit of summary and my thoughts.
For architectural aspects, it lists:
1) Unsupervised - Agrees with LeCun, Bengio, etc. But not sure it's fair to conclude this yet, maybe it should be reinforcement? our brains are prewired to do some things
3) Sparse and Distributed - again plausible and empirically seen in deep learning. One reason ReLu neurons are nice is that they lead to sparser distributed representations.
4) Objectiveless - a metaphysical statement having to do with the Chinese room argument? This seems to mean not optimizing an objective function with gradient descent, and instead "Clearly, the learning algorithm should have a goal, which might be defined very broadly such as the theory of curiosity, creativity and beauty described by J. Schmidhuber". Seems vague and not clear.
5) Scalable - Again not the best choice of words, it seems to argue for parallelism as well as a "hierarchical structure allowing for separate parallel local and global updates of synapses, scalability and unsupervised learning at the lower levels with more goal-oriented finne-tuning in higher regions. " I am disappointed no discussion of memristors or neuromorphic computing was here.
For function aspects, it lists:
1) Compression - sure, pattern matching is in a sense compression so this seems fairly obvious.
2) Prediction - "Whereas the smoothness prior may be considered as a type of spatial coherence,
the assumption that the world is mostly predictable corresponds to temporal or
more generally spatiotemporal coherence. This is probably the most important ingredient of a general-purpose learning procedure." Again, reasonable enough.
3) Understanding - basically equivalent to predicting?
4) Sensorimotor - not clear? Similar to human eye movement?
5) Spatiotemporal Invariance - "one needs to
inject additional contex" having constant concepts of things?
6) Context update/pattern completion - "The last functional component postulated by this paper is a continuous (in the-
ory) loop between bottom-up predictions and top-down context." Constant cycling between prediction and word state update, pretty clear.
Do people generally read these things before upvoting? Legitimately curious.
I don't. And that's because HN doesn't have separate "upvote" and "save" features... upvoting is saving (or, saving is upvoting, however you want to look at it). So I save (upvote) anything that meets the bar of "the title is interesting enough for me to think I might want to read this eventually".
A save feature would be nice perhaps. Although maybe the philosophy of HN is precisely "upvote if something seems interesting enough to read" rather than (or in addition to) "upvote if you have read this and think it's good".
Agree. While this research merely connects the right concepts together in writing, I believe this guy is actually implementing many of those in the right way. A form of hierarchical Q-learning with SDRs. It's also less limited than Numenta's focus on anomaly detection and comes with a GH repository.
https://www.youtube.com/watch?v=ePLbFFL52-Ehttp://twistedkeyboardsoftware.com/?p=137
Thanks for the summary. Net, I was afraid of some such. I.e., IMHO, it won't even start to work.
Here's my first-cut architecture: The core of the intelligence is some concepts and relationships between them. The relationships are based on historical input data from experience since birth.
The concepts are close to nouns in language. The relationships are close but not as close to verbs in language. The relationships are closely related to what we regard as causes. The relationships can be moderated or qualified by, right, adverbs. And, sure, adjectives can modify the nouns.
Then, start like a baby does, with just nouns and then some simple verbs with sentences of just two words.
As for say, kitty cats, early on the input is not in words or language at all -- actual language as a way to get input data, say, fairly directly to concepts with nouns, is later.
For rational and deductive reasoning, that is just a special case where the causal relationships are taken much more seriously than usual.
For vision, to do at all well, need to do at least fairly well with concepts -- that is, vision needs to identify objects for which the intelligence already has, or at least is in the process of learning, concepts and, say, recognizing a 3D object from just its 2D image.
I'm busy with my startup now, but later maybe I'll try to code up what I just described.
Here I'd be using electronic logic; how biologic logic does such things, e.g., search the collection of 3D internal visualizations of concepts that are 3D objects, I don't have a clue.
Favorite quote: "An intelligent algorithm (strong AI [66], among other names) should be able to reveal hidden knowledge which might not even be discoverable to humans."
I think the main idea can be summarized pretty simply. The most important next step towards general intelligence is creating a learning algorithm that can solve a sufficiently general class of problems without much tweaking by humans, and it couldn't hurt to list out the properties such an algorithm would have to have.
Most of this ideas have been pioneered and implemented by Jeff Hawkins and his team at Numenta. See his book "On Inteligence" or the open source project at numenta.org.
how very unsurprising. my advice to the author and the rest of the field would be to read a bit more on learning and memory in humans. AI starts with I.
I'm too lazy to read this but not enough not to throw my two cents.
Non-human mammals are amazing considering how many of them are incredibly capable very early. I'm thinking mostly of large prey for whom the ability to walk and run is crucial. Basically they need to be able to do many things very quickly. It's incredible to see how fast so many new-born grow in so many species.
Also, I've searched for the word "play" in this article and found no occurrence. To me how young mammals play and more importantly what drives them to do so is the core mystery behind the development of the mammalian brain. I suspect that once this is cracked, a big part of the work will be done.