Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not my area of expertise, but the quoted "fact" seems at best incompletely stated: surely for it to hold there must be some constraints on the number of points (likely as a function of the diameter)?


It’s just wrong as stated, there is only one point a full diameter away from each point on a high dimensional sphere. Aka (1,0,0,0,0, …) maps to (-1,0,0,0,0, …) and nothing else. Just as (1,0) maps to (-1,0) on a unit circle and (1,0,0) maps to (-1,0,0) on a unit sphere.

On a high dimensional sphere they should generally be close to square root of 2 radius away from each other.


The fact should say that the expected distance between two random points tends to the diameter as the dimension increases. The intuition is that to be close you need to be close in a large number of coordinates and the law of large numbers (though coordinates aren’t independent) suggests that is unlikely. If you fix one point on a sphere (say (1,0,…,0)) then, for a high dimension, most points will not have any extreme values in coordinates and will look like (~0,~0,…,~0) where ~0 means something close to zero. But if we sum the squares of everything apart from the first we get 1 - (~0)^2 ~= 1, so the distance from our fixed point is (1 - ~0)^2 + sum_2^n (0 - ~0)^2 ~= 1 + 1 = 2.


You forgot the square root on distance formula. Distance = square root of ((X1 - X2) ^ 2 + (Y1 - Y2) ^2 + …).

Consider the origin (0,0,0, …) to a random point on the sphere (~0, ~0, ~0, …). So Distance from origin = square root of ((~0-0)^2 + (~0-0)^2 + (~0-0)^2 + … ), which sums to 1 by definition of the unit high dimensional sphere.

Then plug in 1 vs 0 in the first place because we care about (1,0,0,0 …) and you get the correct answer = square root of ((~0-1)^2 + (~0-0)^2 + (~0-0)^2 + … ) ~= square root of 2.

Edited to fix typo and add clarity.


Wow. Can’t believe I missed that.


If the data points are in the space [0,1]^n, and your metric function is:

d(x,y) = 0 if x == y; 1 otherwise

Then all points are distance one apart. It's been proven that, as dimensionality increases, normal euclidian distance over uniform point clouds rapidly converges to have the same behavior as the equality metric.

The proof relies on the information gained by performing pairwise distance calculations.

In the example distance function I gave, there is zero information gained if you plug in two points that are known to be non-equal.

The information gained from evaluating the Euclidian distance function converges to zero as the dimensionality of the data set increases.

(Note: This does not hold for low dimensional data that's been embedded in a higher dimensional space.)

Edit: Misread your comment. Yes, everything ends up being the same distance apart. More precisely, the ratio of mean distance / stddev distance tends to infinity. The intrinsic dimensionality of the data is monotonic w.r.t. that ratio.


Euclidean distance calculations change based on number of dimensions, for example, in 3 dimensions it is sqrt(a^2+b^2+c^2).


Yes, that’s why it’s square root of 2. Consider the origin (0,0,0, …) to a random point on the sphere (~0, ~0, ~0, …).

Distance = square root of ((X1 - X2) ^ 2 + (Y1 - Y2) ^2 + …). So D = square root of ((~0-0)^2 + (~0-0)^2 + (~0-0)^2 + … ), which is equal to 1 by definition of the unit high dimensional sphere.

So distance from (1,0,0,0 …) to (~0, ~0, ~0, …) = square root of ((~0-1)^2 + (~0-0)^2 + (~0-0)^2 + … ) ~= square root of 2.


Ahh ok, for some reason I was thinking (1,1,1) would be a valid point in this case


It should be almost all points are almost a full diameter away. However it's still very striking, and an unintuitive fact about very high dimensional spheres.


It is for VERY HUGE n, as siblings explain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: