Perform DBSCAN Clustering
Tarea
Swipe to start coding
As we mentioned in the previous chapter, DBSCAN algorithm classifies points as core, border, and noise. As a result, we can use this algorithm to clean our data from outliers. Let's create DBSCAN model, clean data, and look at the results.
Your task is to train DBSCAN model on the circles dataset, detect noise points, and remove them. Look at the visualization and compare data before and after cleaning. You have to:
- Import the
DBSCAN
class fromsklearn.cluster
module. - Use DBSCAN class and
.fit()
method of this class. - Use
.labels_
attribute of DBSCAN class. - Specify
clustering.labels_==-1
to detect noise.
Solución
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN
X, y = make_circles(n_samples=2000, noise=0.1, factor=0.2)
clustering = DBSCAN(eps=0.1, min_samples=5).fit(X)
fig, axes = plt.subplots(1, 3)
fig.set_size_inches(10, 5)
axes[0].scatter(X[:, 0], X[:, 1], c='brown')
axes[0].set_title('Data with noise')
axes[1].scatter(X[:, 0], X[:, 1], c=clustering.labels_, cmap='tab20b')
axes[1].set_title('Clusters with DBscan + noise')
cleaned_X = np.delete(X, np.where(clustering.labels_==-1), axis=0).reshape(-1, 2)
axes[2].scatter(cleaned_X[:, 0], cleaned_X[:, 1], c='brown')
axes[2].set_title('Cleaned data')
¿Todo estuvo claro?
¡Gracias por tus comentarios!
Sección 2. Capítulo 7
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import ___
# Create circles dataset
X, y = make_circles(n_samples=2000, noise=0.1, factor=0.2)
# Train DBSCAN model on circles dataset
clustering = ___(eps=0.1, min_samples=5).___(X)
# Provide visualization
fig, axes = plt.subplots(1, 3)
fig.set_size_inches(10, 5)
axes[0].scatter(X[:, 0], X[:, 1], c='brown')
axes[0].set_title('Data with noise')
axes[1].scatter(X[:, 0], X[:, 1], c=clustering.___, cmap='tab20b')
axes[1].set_title('Clusters with DBscan + noise')
# in this line we will detect samples with are labeled as noise and remove them from our dataset
# np.delete deletes elements in specified axis by indices
# np.where detects indices where samples are clustered as noise
cleaned_X = np.delete(X, np.where(clustering.labels_==___), axis=0).reshape(-1, 2)
# Provide visualization of dataset without outliers
axes[2].scatter(cleaned_X[:, 0], cleaned_X[:, 1], c='brown')
axes[2].set_title('Cleaned data')
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla