Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
External Evaluation | How to choose the best model?
Cluster Analysis

External EvaluationExternal Evaluation

External evaluation for clustering algorithms is a method of evaluating the performance of a clustering algorithm by comparing its results to a known set of class labels or ground truth. In other words, the algorithm's clusters are compared to a set of pre-existing labels created by experts or based on domain knowledge.

Most commonly used external metrics

The Rand Index (RI) measures the similarity between two clusterings or partitions and is often used as an external evaluation metric in clustering. The Rand Index measures the percentage of pairs of data points assigned to the same cluster in both the predicted and true clusterings, normalized by the total number of data point pairs.

The Rand Index is calculated as follows:

  • Let n be the total number of data points;
  • Let a be the number of pairs of data points assigned to the same cluster in both the predicted and true clusterings;
  • Let b be the number of pairs of data points assigned to different clusters in both the predicted and true clustering.

The Rand Index is then given by 2*(a+b)/ (n*(n-1)).

The Rand Index can vary between 0 and 1, where 0 indicates that the two clusterings are completely different, and 1 indicates that the two clusterings are identical.


Mutual Information (MI) measures the amount of information shared by the predicted and true clusterings based on the concept of entropy. We will not consider how this metric is calculated, as this is outside the scope of the beginner-level course.

The Mutual Information varies between 0 and 1, where 0 indicates that the predicted clustering is completely different from the true clustering, and 1 indicates that the predicted clustering is identical to the true clustering. Furthermore, based on the above examples, we can say that this metric is much better at detecting bad clustering than the Rand Index.


Homogeneity measures the degree to which each cluster contains only data points that belong to a single class or category based on conditional entropy. Just like with mutual information, we will not consider the principle of calculating this metric.

A clustering solution is considered highly homogeneous if all the data points that belong to the same true class or category are grouped into the same cluster.
In other words, homogeneity measures the extent to which a clustering algorithm assigns data points to the correct clusters based on their true class or category. The homogeneity score ranges from 0 to 1, with 1 indicating perfect homogeneity.

Homogeneity is the best of all the considered metrics: it determines both good and bad clustering equally well, as shown in the example above.

Can we use external evaluation metrics if we have no information about real partitioning of data into clusters?

Виберіть правильну відповідь

Все було зрозуміло?

Секція 3. Розділ 2
course content

Зміст курсу

Cluster Analysis

External EvaluationExternal Evaluation

External evaluation for clustering algorithms is a method of evaluating the performance of a clustering algorithm by comparing its results to a known set of class labels or ground truth. In other words, the algorithm's clusters are compared to a set of pre-existing labels created by experts or based on domain knowledge.

Most commonly used external metrics

The Rand Index (RI) measures the similarity between two clusterings or partitions and is often used as an external evaluation metric in clustering. The Rand Index measures the percentage of pairs of data points assigned to the same cluster in both the predicted and true clusterings, normalized by the total number of data point pairs.

The Rand Index is calculated as follows:

  • Let n be the total number of data points;
  • Let a be the number of pairs of data points assigned to the same cluster in both the predicted and true clusterings;
  • Let b be the number of pairs of data points assigned to different clusters in both the predicted and true clustering.

The Rand Index is then given by 2*(a+b)/ (n*(n-1)).

The Rand Index can vary between 0 and 1, where 0 indicates that the two clusterings are completely different, and 1 indicates that the two clusterings are identical.


Mutual Information (MI) measures the amount of information shared by the predicted and true clusterings based on the concept of entropy. We will not consider how this metric is calculated, as this is outside the scope of the beginner-level course.

The Mutual Information varies between 0 and 1, where 0 indicates that the predicted clustering is completely different from the true clustering, and 1 indicates that the predicted clustering is identical to the true clustering. Furthermore, based on the above examples, we can say that this metric is much better at detecting bad clustering than the Rand Index.


Homogeneity measures the degree to which each cluster contains only data points that belong to a single class or category based on conditional entropy. Just like with mutual information, we will not consider the principle of calculating this metric.

A clustering solution is considered highly homogeneous if all the data points that belong to the same true class or category are grouped into the same cluster.
In other words, homogeneity measures the extent to which a clustering algorithm assigns data points to the correct clusters based on their true class or category. The homogeneity score ranges from 0 to 1, with 1 indicating perfect homogeneity.

Homogeneity is the best of all the considered metrics: it determines both good and bad clustering equally well, as shown in the example above.

Can we use external evaluation metrics if we have no information about real partitioning of data into clusters?

Виберіть правильну відповідь

Все було зрозуміло?

Секція 3. Розділ 2
some-alt