It's not necessary for the distributions to be normal for this approach to be relevant. A lot of the (very powerful) Gaussian machinery for e.g. graphical models, can still be rigorously applied to a many non-normal distributions by using the Nonparanormal transform:
http://jmlr.csail.mit.edu/papers/volume10/liu09a/liu09a.pdf
This is especially true if you have multimodal distributions (such as the income datasets in the article). Although its true that there are simple algebraic properties that allow you to calculate the mean and variance of the total population given the same for sub-populations, it often isn't the case that this will be a good fit to the data. That being said, this is a useful property for parallelizing gaussian fits.
> That being said, this is a useful property for parallelizing gaussian fits.
Might you be able to clarify this sentence?
I have zero idea what might be meant:
> this is a useful property
What is the antecedent of "this", that is,
in this phrases, what does "this" refer
to?
> gaussian fits
What is a gaussian fit? I have no idea.
I'm comfortable with the Lindeberg-Feller
version of the central limit theorem,
the weak and strong laws of large numbers,
martingale theory, the martingale proof
of the strong law of large numbers,
the Radon-Nikodym theorem,
and the fact that sample mean and
variance are sufficient statistics
for the Gaussian distribution,
but, still, I can't even guess what
a gaussian fit is.
> parallelizing
I can guess that what is meant by
"parallelizing" is the computer
software approach of having
one program try to get some work
done faster by
starts several
threads or tasks in
one or several processor
cores, processors, or computers.
Okay. But what is it about
"gaussian fits" that might
commonly call for "parallelizing"?
Fitting a distribution to data is pretty common parlance in my experience, and there is even a wikipedia article with a relevant name [0].
I presume that the parallelisation point was with reference to the point made by the article, that the calculation of means and variances can be parallelised, so large datasets can be dealt with efficiently.
Okay, from the Wikipedia article,
distribution fitting
appears to be what I feared it might
be.
I'd never do anything like that
and would advise others not do
also.
Why? Because it is not the least
bit clear just what the heck
you get.
Next, likely you should not fit
at all. Instead, if want to
use some distribution with parameters,
e.g., Gaussian, uniform, exponential,
then just estimate the parameters
and not the distribution.
E.g., if you know that the data
is independent, identically distributed
Gaussian, then take the sample
mean and sample variance
and let those be the
two parameters in the Gaussian
distribution.
In that case, will know that the
expectation and variance of the
distribution are the same
as in your data, and that's
good.
That sample mean and variance
are sufficient statistics
for the Gaussian is also a biggie.
And look into the situation for the
rest of the exponential family.
See also the famous
Paul R. Halmos,
"The Theory of Unbiased Estimation",
'Annals of Mathematical Statistics',
Volume 17, Number 1, pages 34-43, 1946.
If want to find the variance of a large
data set, then how much accuracy do you
want? Generally, sample variance
from a few hundred numbers will be okay,
and then don't need to consider
execution on parallel computer hardware.
R. Hamming once wrote, "The purpose
of computing is insight, not numbers."
Along that line, finding sample mean
and variance of a huge data set
promises little or no more
"insight" than just sample
mean and variance of an appropriately
large sample. Of course, we are
assuming that the data is
independent and identically distributed
so that a good sample is easy to find.
I don't know what you inferred from the wiki article, but of course "to fit a gaussian" is to find the parameters describing it, in this case mean and variance.
Look at the "Techniques", the first three are :
"Parametric methods, by which the parameters of the distribution are calculated from the data series.[2] The parametric methods are:
method of moments
method of L-moments[3]
Maximum likelihood method[4]"
which if you work them out you get exactly what you'd expect.
Amazing!
Uh, might want to reconsider and check that.