The author is basically using a linear algebra tool for creating orthogonal basis vectors of a matrix of stock prices. (The PCA is like eigenvector decomposition, but it works on rectangular matrices too. In fact, unlike many operations, it's very fast on unbalanced rectangular matrices!) Since these vectors are, by definition, uncorrelated, they can be very useful in building CAPM-balanced stock portfolios.
Using the PCA is great in this situation, but people often run into traps when using these sorts of spectral-decomposition methods on real world data.
The most obvious is that they try to interpret what the vectors "represent". Sometimes this is reasonable -- if you did a similar experiment on the stock price of energy companies, the strongest vector probably really would be closely correlated with the price of oil. But aside from unusual situations like that, interpreting the "meaning" of spectral vectors is a fool's errand.
It's often true that you can figure out what the first handful (say, 3 to 6) PCA components mean, in a large problem.
The first is usually the mean of the quantities. It is typical in practice to compute PCA by using the SVD of the data itself; if you subtract the mean first, then of course it will not appear as the first PCA component. In matlab, this is literally a one-liner using the svd of the original data -- not even forming a covariance matrix.
No, the people who do this don't care if you know what the Karhunen-Loeve decomposition is, they just use the one-liner:
[U,D] = svd(X)
Anyway, after the mean, then you get the varying components. It's smart to plot these somehow, to interpret meaning.
The post should have plotted the time history of the 6 stocks together with the time history of each PC, then some pattern might have suggested itself. The first PC could be as simple as "GOOG,AMZN,AAPL,AKAM going up, MSFT steady, and FB going down", given the stocks mentioned and their weighting.
The classic examples are (mentioned elsewhere on the thead) the eigenfaces example (on Wikipedia), where PCA was used for faces, and various features like eyes, foreheads, and mouths are emphasized, plus "second-order" features like edges around the eyes, noses, and mouths. If you try it yourself, what you find is that adding more of these second-order features to a face (literally, adding, as in:
where alpha is a small scalar) will shift the nose left or right, or make the mouth bigger.
People have done the same thing with natural images, and out pop things like 2d wavelets (the Gabor filters, http://en.wikipedia.org/wiki/Gabor_filter). It's somewhat magical, because you went in with no information, and out pops this structure, which also characterizes (surprise!) the human visual cortex.
Other classic examples are in atmosphere/weather analysis, where ENSO ("el nino") will pop out of analysis of temperature and pressure fields in the Pacific ocean.
FYI, Gabor-like filters pop out from doing ICA (i.e. Independent Components Analysis), not PCA. While PCA looks for orthogonal vectors onto which the data's projection is normally-distributed (among other properties), ICA, roughly speaking, looks for a set of orthogonal vectors onto which the data's projection has maximal kurtosis (among other properties).
It is the kurtosis-maximization of ICA that tends to produce filters mimicking those found in (early layers of) visual cortex. Hence, the production of such filters by techniques like "sparse coding" and "sparse autoencoders", which explicitly pursue highly-kurtotic representations of the training data. PCA, on the other hand, tends to produce checkerboard (i.e. 2d sinusoidal) filters of various frequencies when trained on "natural image patches".
See: "The 'independent components' of natural scenes are edge filters" by Bell and Sejnowski, 1997.
They used a (linear) "neural network" with gradient descent training that implemented PCA (kind of an iterative graham-schmidt process), and got Gabor-like filters. I think a lot of people have done similar experiments, with varying results.
I hadn't seen that paper before; thanks for the reference. I read through it and saw that they were reweighting the sampled image patches with a Gaussian mask prior to learning, which explains how they got Gabor-like filters. The masking effectively forced the learned filters to have localized receptive fields, while locality/nonlocality is generally one of the (visually) clearer differences between filters learned with ICA/PCA.
In other words, the Gaussian-modulated part of Gaussian-modulated sinusoids was built into their learning process, rather than appearing as an emergent property. I also chuckled a bit when they described how computing eigenvectors for 4096x4096 matrices was "beyond reasonable computation".
PCA is a very useful tool in lots of places. But be warned that when you use it on stocks, you'll find correlations, make your investment, then discover that during a financial crisis all sorts of things that were not previously correlated, now are. Thus your analysis falls apart at exactly the moment you would least want it to do so.
Incidentally if you take answers to a wide variety of questions that are meant to test intelligence, how the component of your score on the first component on a PCA analysis should be fairly well correlated with IQ or your SAT score. The second component should be reasonably well correlated to the difference between your math and verbal scores on the SAT. And people have much less variability on the third component than on the first two.
In financial practice, asset-level PCA isn't as common, especially in systems where covariance estimation is fraught with misspecification errors. Instead, individual securities first condensed to factors (e.g., for equity some examples are book/price, momentum, large vs. small cap, etc.).
Note how this dataset is two dimensional in nature, and PCA yields two vectors. The first gives the direction of the greatest variation, and the next gives the variation orthogonally to the first.
FYI Eigenfaces was a ground-breaking theory when introduced...almost 25 years ago. It's no longer used in any serious way for practical face recognition applications.
PCA goes far deeper than meets the eye. For instance, it's a well-known phenomenon that too much dimensionality can actually drive predictor performance to random, but PCA can mitigate that. It's a basically the bread and butter of practical unsupervised learning.
"bread and butter of practical unsupervised learning" -- true, although I might have said "exploratory data analysis".
If you can make a vector out of it somehow, it can't hurt to try PCA. Because you don't have to figure out some fancy tailored model, or really (cough, cough) understand much about the data at all. (It sounds like I'm being sarcastic, but I'm serious -- sometimes all you want is a quick look.)
I find that more often than expected, PCA (or maybe MDS) gets a majority of the performance of any kind of unsupervised method. If you're really interested in exploring the data and methodologies, then PCA is a poor stopping point... but if you just want it to work, it's surprising how well PCA tradeoffs are good tradeoffs.
All the obvious caveats apply to that whole line of thought, though.
This expository post lined up the 6 stocks and computed the SVD of the time history of all 6 together. This shows how the 6 stocks correlate.
You can do it another way. Run a sliding window across one single stock, line up all the resulting vectors, and then take the SVD of (err...apply PCA to) that. That is, if you started with a single-stock time history:
x1, x2, x3...
then form:
z1 = [x1 x2 x3]
z2 = [x2 x3 x4]
z3 = [x3 x4 x5]
etc., and use PCA on the z's instead of the x's. (In practice, you'd make the z's much longer.)
Using the PCA is great in this situation, but people often run into traps when using these sorts of spectral-decomposition methods on real world data.
The most obvious is that they try to interpret what the vectors "represent". Sometimes this is reasonable -- if you did a similar experiment on the stock price of energy companies, the strongest vector probably really would be closely correlated with the price of oil. But aside from unusual situations like that, interpreting the "meaning" of spectral vectors is a fool's errand.