Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Perform K-means Clustering | Basic Clustering Algorithms
Cluster Analysis

book
Perform K-means Clustering

Tarefa

Swipe to start coding

Let's check the efficiency of the algorithm on different types of clusters. Now we will use the three built-in datasets of the sklearn library and try to use the K-means algorithm to cluster the corresponding points. We will provide visualizations and try to estimate the quality of clustering using these visualizations.

Your task is to use the K-means clustering algorithm and to solve 3 different clustering problems. Compare the results and make conclusions about clustering quality. You have to:

  1. Use KMeans class from cluster module for import.
  2. Use KMeans class to instantiate a class object
  3. Use.fit()method to train model.
  4. Use .labels_attribute to extract fitted clusters.

Once you've completed this task, click the button below the code to check your solution.

Solução

from sklearn.datasets import make_blobs, make_moons, make_circles
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans
import warnings

warnings.filterwarnings('ignore')


def check_clustering_quality(X, y, n_clusters):
fig, axes = plt.subplots(1,2)
axes[0].scatter(X[:, 0], X[:, 1], c=y, cmap='tab20b')
axes[0].set_title('Real clusters')
kmeans=KMeans(n_clusters = n_clusters).fit(X)
axes[1].scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='tab20b')
axes[1].set_title('Clusters with K-means')
X, y = make_blobs(n_samples=500, cluster_std=1, centers=3)
check_clustering_quality(X, y, 3)

X, y = make_moons(n_samples=500)
check_clustering_quality(X, y, 2)

X, y = make_circles(n_samples=500)
check_clustering_quality(X, y, 2)

Note

In visualizations, it is necessary to look not at the color of clusters, but at the relative position of points in real and predicted clusters (Python can color the same clusters with different colors in different pictures due to implementation features)

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 2
from sklearn.datasets import make_blobs, make_moons, make_circles
import matplotlib.pyplot as plt
import numpy as np
from sklearn.___ import ___

# We will use this function to fit KMeans on different datasets and to plot results
def check_clustering_quality(X, y, n_clusters):
fig, axes = plt.subplots(1, 2)
axes[0].scatter(X[:, 0], X[:, 1], c = y, cmap='tab20b')
axes[0].set_title('Real clusters')
kmeans = ___(n_clusters=n_clusters).___(X)
axes[1].scatter(X[:, 0], X[:, 1], c=kmeans.___, cmap='tab20b')
axes[1].set_title('Clusters with K-means')
# Now let's use check_clustering_quality function on blobs, moons and circles datasets
X, y = make_blobs(n_samples=500, cluster_std=1, centers=3)
check_clustering_quality(X, y, 3)

X, y = make_moons(n_samples=500)
check_clustering_quality(X, y, 2)

X, y = make_circles(n_samples=500)
check_clustering_quality(X, y, 2)
toggle bottom row
some-alt