>I don't know what you mean by hierarchy, latent parameters, or compactness. The...

Houshalter · on April 2, 2016

Yes I'm aware, I just don't see how they are relevant to AIXI, an AI with infinite computing power that can model anything with a universal Turing machine.

Statistics jargon will sadly not make AIXI work. It's problem is very deep.

eli_gottlieb · on April 2, 2016

>Statistics jargon will sadly not make AIXI work. It's problem is very deep.

It's not that deep. It's, at worst, dealing with the Curse of Dimensionality because it has no notion of randomness: when the inputs it receives are noisy, the prior probability of Turing machines with the noise bits pre-encoded will drop exponentially with the length of the noise. Hence why compactness and other assumptions about the real-line tend to help in real-world statistics: they make it easy to notice noise and operate with imperfect precision.

Philosophy jargon about "self-awareness" isn't necessarily going to yield more insights than it always has (ie: Chinese Room, ie: very little).

Houshalter · on April 4, 2016

Any realistic approximation of AIXI would use probabilistic Turing machines. That either output probabilities instead of exact predictions, or perhaps use probabilities internally as well.

However for true AIXI with infinite computing power, that doesn't really matter. Randomness can be represented as a stored random seed data, and isn't treated any different than any other unknown variables.

I never used philosophy jargon or even the word "self-awareness". I stated my issues with AIXI in plain english and explained them. It has literally nothing to do with the Chinese Room.

eli_gottlieb · on April 4, 2016

>Any realistic approximation of AIXI would use probabilistic Turing machines.

That would be a fresh model of AI, rather than an AIXI approximation. You should probably look into that idea.

>However for true AIXI with infinite computing power, that doesn't really matter. Randomness can be represented as a stored random seed data, and isn't treated any different than any other unknown variables.

Tsk tsk. It's a curse-of-dimensionality issue. If we have N bits of optimally-compressed random seed plus M bits of structure (and yeah, AIT has ways to separate a string X into its structure and random bits iff you have the normal Halting Oracle), then the prior probability of that particular machine is 2^{-(N+M)}. The noisier the input dataset, the larger N grows. In normal learning, we want M to be constant (which we can usually assume it is: the universe mostly doesn't acquire new causal structure while we're looking at it), which then allows the posterior probability of good hypotheses to rise logarithmically with sample size. If each sample contains noise, then we actually have to split things up:

2^-M for the causal structure, where M is constant and the posterior thus gains information logarithmically. 2^-N for the random seed, where the ground-truth random seed actually grows linearly in length with each sample we observe (because of the bits of entropy Nature used-up to make that sample); the prior probability of each random-seed drops logarithmically as the number of samples grows we anticipate seeing grows.

So while this is all very informal, I'd have to say that a noisy Solomonoff induction actually suffers from a Curse of Dimensionality because it assumes everything is discrete, while more typical machine-learning models based on continuous distributions can learn well in the face of noise.