The least squares and pca minimize different loss functions. One is
sum of squares of vertical(y) distances, another is is sum of closest distances to the line. That introduces the differences.
The Pythagorean distance would assume that some of the distance (difference) is on the x axis, and some on the y axis, and the total distance is orthogonal to the fitted line.
OLS assumes that x is given, and the distance is entirely due to the variance in y, (so parallel to the y axis). It’s not the line that’s skewed, it’s the space.
They both fit Gaussians, just different ones! OLS fits a 1D Gaussian to the set of errors in the y coordinates only, whereas TLS (PCA) fits a 2D Gaussian to the set of all (x,y) pairs.
Yes, and if I remember correctly, you get the Gaussian because it's the minimum entropy (least additional assumptions about the shape) continuous distribution given a certain variance.
Both of these do, in a way. They just differ in which gaussian distribution they're fitting to.
And how I suppose. PCA is effectively moment matching, least squares is max likelihood. These correspond to the two ways of minimizing the Kullback Leibler divergence to or from a gaussian distribution.