Challenge: Using DBSCAN Clustering to Detect Outliers
Завдання
Swipe to start coding
Now, you will apply the DBSCAN clustering algorithm to detect outliers on a simple Iris dataset.
You have to:
- Specify the parameters of the DBScan algorithm: set
eps
equal to0.35
andmin_samples
equal to6
. - Fit the algorithm and provide clustering.
- Get outlier indexes and indexes of normal data. Pay attention that outliers detected by the algorithm have a
-1
label.
Рішення
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn import datasets
import numpy as np
# Load the Iris dataset
iris = datasets.load_iris()
# Create a DataFrame directly from the data and target arrays
data = pd.DataFrame(data=np.c_[iris.data, iris.target], columns=iris.feature_names + ['target'])
# Extract the features (Sepal Length, Sepal Width, Petal Length, Petal Width)
features = data.iloc[:, :2] # We only consider the first two features for visualization
# Apply DBSCAN clustering for outlier detection
dbscan = DBSCAN(eps=0.35, min_samples=6)
dbscan_labels = dbscan.fit_predict(features)
# Visualize the results (2D scatter plot)
outlier_indices = np.where(dbscan_labels == -1)
normal_indices = np.where(dbscan_labels != -1)
plt.scatter(features.loc[normal_indices, 'sepal length (cm)'], features.loc[normal_indices, 'sepal width (cm)'],
c='blue', label='Normal Data', alpha=0.6)
plt.scatter(features.loc[outlier_indices, 'sepal length (cm)'], features.loc[outlier_indices, 'sepal width (cm)'],
c='red', marker='x', label='Outliers', s=100)
plt.title('DBSCAN Outlier Detection on Iris Dataset')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.legend()
plt.show()
# Identify and print the number of outliers (anomalies) detected
outliers = data.iloc[outlier_indices]
print(f"Number of outliers detected: {len(outliers)}")
Все було зрозуміло?
Дякуємо за ваш відгук!
Секція 3. Розділ 2
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn import datasets
import numpy as np
# Load the Iris dataset
iris = datasets.load_iris()
# Create a DataFrame directly from the data and target arrays
data = pd.DataFrame(data=np.c_[iris.data, iris.target], columns=iris.feature_names + ['target'])
# Extract the features (Sepal Length, Sepal Width, Petal Length, Petal Width)
features = data.iloc[:, :2] # We only consider the first two features for visualization
# Apply DBSCAN clustering for outlier detection
dbscan = DBSCAN(eps=___, min_samples=___)
dbscan_labels = dbscan.___(features)
# Visualize the results (2D scatter plot)
outlier_indices = np.___(dbscan_labels ___ -1)
normal_indices = np.___(dbscan_labels ___ -1)
plt.scatter(features.loc[normal_indices, 'sepal length (cm)'], features.loc[normal_indices, 'sepal width (cm)'],
c='blue', label='Normal Data', alpha=0.6)
plt.scatter(features.loc[outlier_indices, 'sepal length (cm)'], features.loc[outlier_indices, 'sepal width (cm)'],
c='red', marker='x', label='Outliers', s=100)
plt.title('DBSCAN Outlier Detection on Iris Dataset')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.legend()
plt.show()
# Identify and print the number of outliers (anomalies) detected
outliers = data.iloc[outlier_indices]
print(f"Number of outliers detected: {len(outliers)}")
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат