Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Finding Optimal Number of Clusters Using WSS | K-Means
Cluster Analysis with Python

Finding Optimal Number of Clusters Using WSS

Swipe to show menu

In K-means clustering, determining the optimal number of clusters, K, is a critical decision. Choosing the right K is essential to uncover meaningful patterns in your data. Too few clusters might oversimplify the data, while too many might create overly specific and less useful clusters. Therefore, methods to guide your choice of K are important.

One popular technique for finding the optimal K is the within-sum-of-squares (WSS) metric. WSS measures the sum of squared distances between each data point and its assigned centroid within a cluster. Essentially, WSS indicates how compact the clusters are. Lower WSS values suggest tighter, more compact clusters.

Different number of clusters

To use WSS to find the optimal K, you would typically follow these steps:

Run K-means for a range of K values
expand arrow
  • Try K values from 1 up to a reasonable limit like 10 or 15;
Calculate WSS for each K
expand arrow
  • Compute the Within-Cluster Sum of Squares (WSS) for every value of K;
Plot WSS as a function of K
expand arrow
  • Create a plot with K values on the x-axis and WSS on the y-axis;
  • This is called the WSS plot or elbow plot;
Find the elbow point
expand arrow
  • Look for a point where the WSS curve bends, forming an elbow;
  • This point suggests the optimal number of clusters.
Note
Note

The elbow point in the WSS plot is crucial. It represents the point after which the decrease in WSS starts to slow down significantly.

This elbow is often considered a strong indicator of the optimal K for the following reasons:

  • It suggests diminishing returns: adding more clusters beyond the elbow does not lead to a substantial improvement in WSS, meaning clusters are not getting significantly more compact;

  • It balances granularity and simplicity: the elbow often represents a good balance between capturing the essential structure in the data without overfitting or creating unnecessarily fine-grained clusters.

Elbow method

Keep in mind that the elbow method is a heuristic. The elbow point may not always be sharply defined, and other factors might influence your final choice of K. Visual inspection of the resulting clusters and your domain knowledge are valuable supplements to the elbow method.

question mark

When using the WSS method to choose the number of clusters in K-means, what does the elbow point on the WSS plot typically represent?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 3
some-alt