Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Perform DBSCAN Clustering | Basic Clustering Algorithms
Cluster Analysis

book
Perform DBSCAN Clustering

Tarea

Swipe to start coding

As we mentioned in the previous chapter, DBSCAN algorithm classifies points as core, border, and noise. As a result, we can use this algorithm to clean our data from outliers. Let's create DBSCAN model, clean data, and look at the results.

Your task is to train DBSCAN model on the circles dataset, detect noise points, and remove them. Look at the visualization and compare data before and after cleaning. You have to:

  1. Import the DBSCAN class from sklearn.cluster module.
  2. Use DBSCAN class and .fit() method of this class.
  3. Use .labels_ attribute of DBSCAN class.
  4. Specify clustering.labels_==-1 to detect noise.

Solución

from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN

X, y = make_circles(n_samples=2000, noise=0.1, factor=0.2)

clustering = DBSCAN(eps=0.1, min_samples=5).fit(X)
fig, axes = plt.subplots(1, 3)
fig.set_size_inches(10, 5)
axes[0].scatter(X[:, 0], X[:, 1], c='brown')
axes[0].set_title('Data with noise')
axes[1].scatter(X[:, 0], X[:, 1], c=clustering.labels_, cmap='tab20b')
axes[1].set_title('Clusters with DBscan + noise')
cleaned_X = np.delete(X, np.where(clustering.labels_==-1), axis=0).reshape(-1, 2)
axes[2].scatter(cleaned_X[:, 0], cleaned_X[:, 1], c='brown')
axes[2].set_title('Cleaned data')

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 7
from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import ___
# Create circles dataset
X, y = make_circles(n_samples=2000, noise=0.1, factor=0.2)

# Train DBSCAN model on circles dataset
clustering = ___(eps=0.1, min_samples=5).___(X)
# Provide visualization
fig, axes = plt.subplots(1, 3)
fig.set_size_inches(10, 5)
axes[0].scatter(X[:, 0], X[:, 1], c='brown')
axes[0].set_title('Data with noise')
axes[1].scatter(X[:, 0], X[:, 1], c=clustering.___, cmap='tab20b')
axes[1].set_title('Clusters with DBscan + noise')

# in this line we will detect samples with are labeled as noise and remove them from our dataset
# np.delete deletes elements in specified axis by indices
# np.where detects indices where samples are clustered as noise
cleaned_X = np.delete(X, np.where(clustering.labels_==___), axis=0).reshape(-1, 2)
# Provide visualization of dataset without outliers
axes[2].scatter(cleaned_X[:, 0], cleaned_X[:, 1], c='brown')
axes[2].set_title('Cleaned data')

Pregunte a AI

expand
ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

some-alt