Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Perform DBSCAN Clustering | Basic Clustering Algorithms
Cluster Analysis

book
Perform DBSCAN Clustering

Tarefa

Swipe to start coding

As we mentioned in the previous chapter, DBSCAN algorithm classifies points as core, border, and noise. As a result, we can use this algorithm to clean our data from outliers. Let's create DBSCAN model, clean data, and look at the results.

Your task is to train DBSCAN model on the circles dataset, detect noise points, and remove them. Look at the visualization and compare data before and after cleaning. You have to:

  1. Import the DBSCAN class from sklearn.cluster module.
  2. Use DBSCAN class and .fit() method of this class.
  3. Use .labels_ attribute of DBSCAN class.
  4. Specify clustering.labels_==-1 to detect noise.

Solução

from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN

X, y = make_circles(n_samples=2000, noise=0.1, factor=0.2)

clustering = DBSCAN(eps=0.1, min_samples=5).fit(X)
fig, axes = plt.subplots(1, 3)
fig.set_size_inches(10, 5)
axes[0].scatter(X[:, 0], X[:, 1], c='brown')
axes[0].set_title('Data with noise')
axes[1].scatter(X[:, 0], X[:, 1], c=clustering.labels_, cmap='tab20b')
axes[1].set_title('Clusters with DBscan + noise')
cleaned_X = np.delete(X, np.where(clustering.labels_==-1), axis=0).reshape(-1, 2)
axes[2].scatter(cleaned_X[:, 0], cleaned_X[:, 1], c='brown')
axes[2].set_title('Cleaned data')

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 7
from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import ___
# Create circles dataset
X, y = make_circles(n_samples=2000, noise=0.1, factor=0.2)

# Train DBSCAN model on circles dataset
clustering = ___(eps=0.1, min_samples=5).___(X)
# Provide visualization
fig, axes = plt.subplots(1, 3)
fig.set_size_inches(10, 5)
axes[0].scatter(X[:, 0], X[:, 1], c='brown')
axes[0].set_title('Data with noise')
axes[1].scatter(X[:, 0], X[:, 1], c=clustering.___, cmap='tab20b')
axes[1].set_title('Clusters with DBscan + noise')

# in this line we will detect samples with are labeled as noise and remove them from our dataset
# np.delete deletes elements in specified axis by indices
# np.where detects indices where samples are clustered as noise
cleaned_X = np.delete(X, np.where(clustering.labels_==___), axis=0).reshape(-1, 2)
# Provide visualization of dataset without outliers
axes[2].scatter(cleaned_X[:, 0], cleaned_X[:, 1], c='brown')
axes[2].set_title('Cleaned data')
toggle bottom row
some-alt