Well, as you remember, there are no 100% correct answers to clustering problems. For the last task you solved it seems like 5 clusters might be a good option.

Let's visualize the results of clustering into 5 groups by building the scatter plot for average February vs July temperatures, which are one of the coldest and hottest months respectively.

Clustering is a common data science task of grouping a set of objects into groups, in which the dissimilarity between objects would be minimal. Cluster analysis itself is not an algorithm, it is just a general task to be solved. There are many clustering algorithms that exist, but we will stop on certain four.

The first algorithm to be considered is the K-Means. This algorithm uses centroids to split the points into clusters. In this section, you will consider how to implement such an algorithm and how to choose the number of clusters.

The second algorithm that will be considered is the K-Medoids algorithm. It works the same way as the previous one (K-Means) but uses medoids as the 'center' points of clusters. In this section, you will get to know what is the difference between these algorithms, one more way to define a possible number of clusters, and of course the algorithm implementation.

The third algorithm in this course is Hierarchical Clustering. This algorithm can be easily visualized by using dendrograms. In this section, you will get to know how to implement such an algorithm and how can it be tuned to improve the clustering quality.

The last algorithm to be considered is probably the hardest in terms of math. In this section, you will superficially be introduced to such an algorithm, why should such a hard algorithm be used, and of course implementation.

February vs July Average Temperatures

Oplossing

Awesome!

February vs July Average Temperatures

Oplossing