Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Distances | Foundations of High-Dimensional Geometry
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Geometry of High-Dimensional Data

bookDistances

As you move into high-dimensional spaces, the way distances behave between points changes dramatically. In lower dimensions, such as the familiar two- or three-dimensional spaces, the distances between randomly chosen points can vary widely β€” some points are close, others are much farther away. However, as the number of dimensions increases, the distribution of pairwise distances tends to shift. Most points end up at a similar distance from one another, and the spread of these distances narrows. This is closely related to the edge effect discussed previously: in high dimensions, points are more likely to be found near the boundaries of the space, and the "center" becomes less meaningful. The result is that almost all points are far from each other in an absolute sense, but the difference between the closest and farthest pairs becomes surprisingly small.

Note
Definition

Distance collapse is the phenomenon where, in high-dimensional spaces, the distances between all pairs of points become nearly equal. This makes it challenging to distinguish between points that are "near" and those that are "far".

Because of distance collapse, distinguishing between "near" and "far" points becomes extremely difficult. When almost every pair of points is separated by roughly the same distance, the notion of proximity loses its usefulness. This has major implications for tasks such as clustering, classification, or any method that relies on comparing distances. In high dimensions, algorithms that depend on finding the nearest neighbor or measuring similarity based on distance can struggle, since the contrast between the closest and farthest points nearly vanishes.

Euclidean Distance (L2 norm)
expand arrow

As dimensionality increases, the Euclidean distance between points tends to concentrate around a mean value. The variance of these distances shrinks, leading to distance collapse. This effect is especially pronounced in spaces with many dimensions.

Manhattan Distance (L1 norm)
expand arrow

The Manhattan distance also suffers from distance collapse, but the concentration effect is slightly less severe than with Euclidean distance. Still, as dimensions grow, distinguishing "near" from "far" remains problematic.

Other Metrics
expand arrow

Different metrics may experience distance collapse at different rates, but all common norms (Lp metrics) are affected in high-dimensional spaces. The choice of metric can slightly affect the degree, but not the existence, of the phenomenon.

question mark

Which of the following best describes the main challenge that distance collapse poses for algorithms relying on nearest neighbors in high-dimensional spaces?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain why the difference between the closest and farthest points becomes small in high dimensions?

How does this phenomenon affect real-world machine learning algorithms?

Are there any techniques to address the challenges caused by distance collapse?

bookDistances

Swipe to show menu

As you move into high-dimensional spaces, the way distances behave between points changes dramatically. In lower dimensions, such as the familiar two- or three-dimensional spaces, the distances between randomly chosen points can vary widely β€” some points are close, others are much farther away. However, as the number of dimensions increases, the distribution of pairwise distances tends to shift. Most points end up at a similar distance from one another, and the spread of these distances narrows. This is closely related to the edge effect discussed previously: in high dimensions, points are more likely to be found near the boundaries of the space, and the "center" becomes less meaningful. The result is that almost all points are far from each other in an absolute sense, but the difference between the closest and farthest pairs becomes surprisingly small.

Note
Definition

Distance collapse is the phenomenon where, in high-dimensional spaces, the distances between all pairs of points become nearly equal. This makes it challenging to distinguish between points that are "near" and those that are "far".

Because of distance collapse, distinguishing between "near" and "far" points becomes extremely difficult. When almost every pair of points is separated by roughly the same distance, the notion of proximity loses its usefulness. This has major implications for tasks such as clustering, classification, or any method that relies on comparing distances. In high dimensions, algorithms that depend on finding the nearest neighbor or measuring similarity based on distance can struggle, since the contrast between the closest and farthest points nearly vanishes.

Euclidean Distance (L2 norm)
expand arrow

As dimensionality increases, the Euclidean distance between points tends to concentrate around a mean value. The variance of these distances shrinks, leading to distance collapse. This effect is especially pronounced in spaces with many dimensions.

Manhattan Distance (L1 norm)
expand arrow

The Manhattan distance also suffers from distance collapse, but the concentration effect is slightly less severe than with Euclidean distance. Still, as dimensions grow, distinguishing "near" from "far" remains problematic.

Other Metrics
expand arrow

Different metrics may experience distance collapse at different rates, but all common norms (Lp metrics) are affected in high-dimensional spaces. The choice of metric can slightly affect the degree, but not the existence, of the phenomenon.

question mark

Which of the following best describes the main challenge that distance collapse poses for algorithms relying on nearest neighbors in high-dimensional spaces?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3
some-alt