Principle Component Analysis

lbarrow · on Aug 17, 2012

The author is basically using a linear algebra tool for creating orthogonal basis vectors of a matrix of stock prices. (The PCA is like eigenvector decomposition, but it works on rectangular matrices too. In fact, unlike many operations, it's very fast on unbalanced rectangular matrices!) Since these vectors are, by definition, uncorrelated, they can be very useful in building CAPM-balanced stock portfolios.

Using the PCA is great in this situation, but people often run into traps when using these sorts of spectral-decomposition methods on real world data.

The most obvious is that they try to interpret what the vectors "represent". Sometimes this is reasonable -- if you did a similar experiment on the stock price of energy companies, the strongest vector probably really would be closely correlated with the price of oil. But aside from unusual situations like that, interpreting the "meaning" of spectral vectors is a fool's errand.

mturmon · on Aug 17, 2012

It's often true that you can figure out what the first handful (say, 3 to 6) PCA components mean, in a large problem.

The first is usually the mean of the quantities. It is typical in practice to compute PCA by using the SVD of the data itself; if you subtract the mean first, then of course it will not appear as the first PCA component. In matlab, this is literally a one-liner using the svd of the original data -- not even forming a covariance matrix.

No, the people who do this don't care if you know what the Karhunen-Loeve decomposition is, they just use the one-liner:

  [U,D] = svd(X)

Anyway, after the mean, then you get the varying components. It's smart to plot these somehow, to interpret meaning.

The post should have plotted the time history of the 6 stocks together with the time history of each PC, then some pattern might have suggested itself. The first PC could be as simple as "GOOG,AMZN,AAPL,AKAM going up, MSFT steady, and FB going down", given the stocks mentioned and their weighting.

The classic examples are (mentioned elsewhere on the thead) the eigenfaces example (on Wikipedia), where PCA was used for faces, and various features like eyes, foreheads, and mouths are emphasized, plus "second-order" features like edges around the eyes, noses, and mouths. If you try it yourself, what you find is that adding more of these second-order features to a face (literally, adding, as in:

  new_image = old_image + alpha * second_order_feature

where alpha is a small scalar) will shift the nose left or right, or make the mouth bigger.

People have done the same thing with natural images, and out pop things like 2d wavelets (the Gabor filters, http://en.wikipedia.org/wiki/Gabor_filter). It's somewhat magical, because you went in with no information, and out pops this structure, which also characterizes (surprise!) the human visual cortex.

Other classic examples are in atmosphere/weather analysis, where ENSO ("el nino") will pop out of analysis of temperature and pressure fields in the Pacific ocean.

psb217 · on Aug 17, 2012

FYI, Gabor-like filters pop out from doing ICA (i.e. Independent Components Analysis), not PCA. While PCA looks for orthogonal vectors onto which the data's projection is normally-distributed (among other properties), ICA, roughly speaking, looks for a set of orthogonal vectors onto which the data's projection has maximal kurtosis (among other properties).

It is the kurtosis-maximization of ICA that tends to produce filters mimicking those found in (early layers of) visual cortex. Hence, the production of such filters by techniques like "sparse coding" and "sparse autoencoders", which explicitly pursue highly-kurtotic representations of the training data. PCA, on the other hand, tends to produce checkerboard (i.e. 2d sinusoidal) filters of various frequencies when trained on "natural image patches".

See: "The 'independent components' of natural scenes are edge filters" by Bell and Sejnowski, 1997.

mturmon · on Aug 18, 2012

Thanks for the reminder. I was thinking of this 1991 paper, which I ran into a long time ago:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1...

They used a (linear) "neural network" with gradient descent training that implemented PCA (kind of an iterative graham-schmidt process), and got Gabor-like filters. I think a lot of people have done similar experiments, with varying results.

psb217 · on Aug 20, 2012

I hadn't seen that paper before; thanks for the reference. I read through it and saw that they were reweighting the sampled image patches with a Gaussian mask prior to learning, which explains how they got Gabor-like filters. The masking effectively forced the learned filters to have localized receptive fields, while locality/nonlocality is generally one of the (visually) clearer differences between filters learned with ICA/PCA.

In other words, the Gaussian-modulated part of Gaussian-modulated sinusoids was built into their learning process, rather than appearing as an emergent property. I also chuckled a bit when they described how computing eigenvectors for 4096x4096 matrices was "beyond reasonable computation".

hobbyist · on Aug 17, 2012

Wow. thts cool really cool. PCA of energy companies being correlated with oil.. intriguing and interesting

btilly · on Aug 17, 2012

PCA is a very useful tool in lots of places. But be warned that when you use it on stocks, you'll find correlations, make your investment, then discover that during a financial crisis all sorts of things that were not previously correlated, now are. Thus your analysis falls apart at exactly the moment you would least want it to do so.

Incidentally if you take answers to a wide variety of questions that are meant to test intelligence, how the component of your score on the first component on a PCA analysis should be fairly well correlated with IQ or your SAT score. The second component should be reasonably well correlated to the difference between your math and verbal scores on the SAT. And people have much less variability on the third component than on the first two.

agentq · on Aug 17, 2012

In financial practice, asset-level PCA isn't as common, especially in systems where covariance estimation is fraught with misspecification errors. Instead, individual securities first condensed to factors (e.g., for equity some examples are book/price, momentum, large vs. small cap, etc.).

azmenthe · on Aug 17, 2012

Yes. The fund I work for has a very successful track record and we take all PCA (on factors) with a huge grain of salt.

Also any strategy that has more than a 20% thesis alignment on PCA (on factors!) is most likely laughed at.

robert00700 · on Aug 17, 2012

Nice to see PCA in an HN article, it's a very powerful tool.

For those struggling to get the example in this article, I find PCA easier to understand given visual examples, and in less dimensions (try http://en.wikipedia.org/wiki/File:GaussianScatterPCA.png)

Note how this dataset is two dimensional in nature, and PCA yields two vectors. The first gives the direction of the greatest variation, and the next gives the variation orthogonally to the first.

An awesome use of PCA is for facial detection, a method called 'Eigenfaces' http://en.wikipedia.org/wiki/Eigenface

j2kun · on Aug 17, 2012

I wrote a blog post with more detail, and lots of intuitive examples. see http://jeremykun.wordpress.com/2011/07/27/eigenfaces/

apu · on Aug 17, 2012

FYI Eigenfaces was a ground-breaking theory when introduced...almost 25 years ago. It's no longer used in any serious way for practical face recognition applications.

misiti3780 · on Aug 17, 2012

eigenfaces are very cool. i did a bunch of work with them a few years ago.

tel · on Aug 17, 2012

PCA goes far deeper than meets the eye. For instance, it's a well-known phenomenon that too much dimensionality can actually drive predictor performance to random, but PCA can mitigate that. It's a basically the bread and butter of practical unsupervised learning.

mturmon · on Aug 17, 2012

"bread and butter of practical unsupervised learning" -- true, although I might have said "exploratory data analysis".

If you can make a vector out of it somehow, it can't hurt to try PCA. Because you don't have to figure out some fancy tailored model, or really (cough, cough) understand much about the data at all. (It sounds like I'm being sarcastic, but I'm serious -- sometimes all you want is a quick look.)

Unsupervised clustering is a similar technique.

tel · on Aug 17, 2012

I find that more often than expected, PCA (or maybe MDS) gets a majority of the performance of any kind of unsupervised method. If you're really interested in exploring the data and methodologies, then PCA is a poor stopping point... but if you just want it to work, it's surprising how well PCA tradeoffs are good tradeoffs.

All the obvious caveats apply to that whole line of thought, though.

snikolov · on Aug 17, 2012

Indeed, projection onto the principal subspace is a kind of regularization, which makes for better generalization.

This kind of makes sense intuitively. These slides go into more detail (the whole course is great), where it says "Projection regularizes!"

http://www.mit.edu/~9.520/spring09/Classes/class07_spectral....

misiti3780 · on Aug 17, 2012

PCA can also be used for compression

http://www.willamette.edu/~gorr/classes/cs449/Unsupervised/p...

Also worth noting is apache mahout supports PCA - you can perform this type of analysis on large matrices pretty easily these days

mturmon · on Aug 17, 2012

This expository post lined up the 6 stocks and computed the SVD of the time history of all 6 together. This shows how the 6 stocks correlate.

You can do it another way. Run a sliding window across one single stock, line up all the resulting vectors, and then take the SVD of (err...apply PCA to) that. That is, if you started with a single-stock time history:

  x1, x2, x3...

then form:

  z1 = [x1 x2 x3]
  z2 = [x2 x3 x4]
  z3 = [x3 x4 x5]

etc., and use PCA on the z's instead of the x's. (In practice, you'd make the z's much longer.)

This will extract seasonable variability (on all kinds of scales -- not just annual). One name for it is Singular Spectrum Analysis (http://en.wikipedia.org/wiki/Singular_spectrum_analysis)

eykanal · on Aug 17, 2012

For what it's worth, the best PCA tutorial I've seen online is this blog post, which uses plots to describe the technique:

http://stats.stackexchange.com/a/2700/2019

PCA is nothing more than a "basis shift", or changing where the x and y axes are placed. This image-based tutorial makes understanding very intuitive.