Learn Distances | Foundations of High-Dimensional Geometry

Swipe to show menu

As you move into high-dimensional spaces, the way distances behave between points changes dramatically. In lower dimensions, such as the familiar two- or three-dimensional spaces, the distances between randomly chosen points can vary widely — some points are close, others are much farther away. However, as the number of dimensions increases, the distribution of pairwise distances tends to shift. Most points end up at a similar distance from one another, and the spread of these distances narrows. This is closely related to the edge effect discussed previously: in high dimensions, points are more likely to be found near the boundaries of the space, and the "center" becomes less meaningful. The result is that almost all points are far from each other in an absolute sense, but the difference between the closest and farthest pairs becomes surprisingly small.

Definition

Distance collapse is the phenomenon where, in high-dimensional spaces, the distances between all pairs of points become nearly equal. This makes it challenging to distinguish between points that are "near" and those that are "far".

Because of distance collapse, distinguishing between "near" and "far" points becomes extremely difficult. When almost every pair of points is separated by roughly the same distance, the notion of proximity loses its usefulness. This has major implications for tasks such as clustering, classification, or any method that relies on comparing distances. In high dimensions, algorithms that depend on finding the nearest neighbor or measuring similarity based on distance can struggle, since the contrast between the closest and farthest points nearly vanishes.

Euclidean Distance (L2 norm)

As dimensionality increases, the Euclidean distance between points tends to concentrate around a mean value. The variance of these distances shrinks, leading to distance collapse. This effect is especially pronounced in spaces with many dimensions.

Manhattan Distance (L1 norm)

The Manhattan distance also suffers from distance collapse, but the concentration effect is slightly less severe than with Euclidean distance. Still, as dimensions grow, distinguishing "near" from "far" remains problematic.

Other Metrics

Different metrics may experience distance collapse at different rates, but all common norms (Lp metrics) are affected in high-dimensional spaces. The choice of metric can slightly affect the degree, but not the existence, of the phenomenon.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 3