Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Challenge: Using DBSCAN Clustering to Detect Outliers | Machine Learning Techniques
Data Anomaly Detection

book
Challenge: Using DBSCAN Clustering to Detect Outliers

Завдання

Swipe to start coding

Now, you will apply the DBSCAN clustering algorithm to detect outliers on a simple Iris dataset.
You have to:

  1. Specify the parameters of the DBScan algorithm: set eps equal to 0.35 and min_samples equal to 6.
  2. Fit the algorithm and provide clustering.
  3. Get outlier indexes and indexes of normal data. Pay attention that outliers detected by the algorithm have a -1 label.

Рішення

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn import datasets
import numpy as np

# Load the Iris dataset
iris = datasets.load_iris()

# Create a DataFrame directly from the data and target arrays
data = pd.DataFrame(data=np.c_[iris.data, iris.target], columns=iris.feature_names + ['target'])

# Extract the features (Sepal Length, Sepal Width, Petal Length, Petal Width)
features = data.iloc[:, :2] # We only consider the first two features for visualization

# Apply DBSCAN clustering for outlier detection
dbscan = DBSCAN(eps=0.35, min_samples=6)
dbscan_labels = dbscan.fit_predict(features)

# Visualize the results (2D scatter plot)
outlier_indices = np.where(dbscan_labels == -1)
normal_indices = np.where(dbscan_labels != -1)

plt.scatter(features.loc[normal_indices, 'sepal length (cm)'], features.loc[normal_indices, 'sepal width (cm)'],
c='blue', label='Normal Data', alpha=0.6)
plt.scatter(features.loc[outlier_indices, 'sepal length (cm)'], features.loc[outlier_indices, 'sepal width (cm)'],
c='red', marker='x', label='Outliers', s=100)
plt.title('DBSCAN Outlier Detection on Iris Dataset')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.legend()
plt.show()

# Identify and print the number of outliers (anomalies) detected
outliers = data.iloc[outlier_indices]
print(f"Number of outliers detected: {len(outliers)}")

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 2
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from sklearn import datasets
import numpy as np

# Load the Iris dataset
iris = datasets.load_iris()

# Create a DataFrame directly from the data and target arrays
data = pd.DataFrame(data=np.c_[iris.data, iris.target], columns=iris.feature_names + ['target'])

# Extract the features (Sepal Length, Sepal Width, Petal Length, Petal Width)
features = data.iloc[:, :2] # We only consider the first two features for visualization

# Apply DBSCAN clustering for outlier detection
dbscan = DBSCAN(eps=___, min_samples=___)
dbscan_labels = dbscan.___(features)

# Visualize the results (2D scatter plot)
outlier_indices = np.___(dbscan_labels ___ -1)
normal_indices = np.___(dbscan_labels ___ -1)

plt.scatter(features.loc[normal_indices, 'sepal length (cm)'], features.loc[normal_indices, 'sepal width (cm)'],
c='blue', label='Normal Data', alpha=0.6)
plt.scatter(features.loc[outlier_indices, 'sepal length (cm)'], features.loc[outlier_indices, 'sepal width (cm)'],
c='red', marker='x', label='Outliers', s=100)
plt.title('DBSCAN Outlier Detection on Iris Dataset')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.legend()
plt.show()

# Identify and print the number of outliers (anomalies) detected
outliers = data.iloc[outlier_indices]
print(f"Number of outliers detected: {len(outliers)}")

Запитати АІ

expand
ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

some-alt