Well done! Let's look at the last line charts you built in the previous chapter.

As you can see, only the ward linkage could catch the 'downward up to July' trend. Both results are different. But let's find out how different they are using the rand index.

Clustering is a common data science task of grouping a set of objects into groups, in which the dissimilarity between objects would be minimal. Cluster analysis itself is not an algorithm, it is just a general task to be solved. There are many clustering algorithms that exist, but we will stop on certain four.

The first algorithm to be considered is the K-Means. This algorithm uses centroids to split the points into clusters. In this section, you will consider how to implement such an algorithm and how to choose the number of clusters.

The second algorithm that will be considered is the K-Medoids algorithm. It works the same way as the previous one (K-Means) but uses medoids as the 'center' points of clusters. In this section, you will get to know what is the difference between these algorithms, one more way to define a possible number of clusters, and of course the algorithm implementation.

The third algorithm in this course is Hierarchical Clustering. This algorithm can be easily visualized by using dendrograms. In this section, you will get to know how to implement such an algorithm and how can it be tuned to improve the clustering quality.

The last algorithm to be considered is probably the hardest in terms of math. In this section, you will superficially be introduced to such an algorithm, why should such a hard algorithm be used, and of course implementation.

How Similar are the Results?

Lösning