Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Impara Clustering for Anomaly Detection | Engineering Data Science Applications
Python for Engineers

bookClustering for Anomaly Detection

Clustering is a powerful technique in engineering data science that enables you to uncover patterns and groupings within complex datasets, even when you do not know the underlying structure in advance. In engineering applications, clustering is especially useful for anomaly detection, such as identifying unusual sensor readings that could indicate faulty equipment, abnormal operating conditions, or the need for maintenance. By grouping similar data points together, clustering allows you to spot outliers—those points that do not fit well into any group—which are often the very anomalies engineers need to find.

1234567891011121314151617181920
import numpy as np from sklearn.cluster import KMeans # Simulated vibration sensor data from a machine (in mm/s) # Most readings are normal, but a few are unusually high or low vibration_data = np.array([ [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2], [8.0], # Possible anomaly (very high) [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3], [0.5], # Possible anomaly (very low) [2.2], [2.3], [2.1], [2.4], [2.2] ]) # Cluster into 2 groups (normal and abnormal) kmeans = KMeans(n_clusters=2, random_state=42) labels = kmeans.fit_predict(vibration_data) # Print cluster centers and labels print("Cluster centers:", kmeans.cluster_centers_.flatten()) print("Labels:", labels)
copy

After clustering the vibration sensor data, you can interpret the results by examining the cluster centers and the labels assigned to each data point. The cluster centers represent the typical vibration levels for each group. In this example, you should see one cluster center near the normal operating vibration (around 2.2 mm/s) and another further away, capturing the abnormal values. Data points assigned to the cluster with a center far from the norm may be considered suspicious. By reviewing which points belong to which cluster, you gain insight into which readings are typical and which may indicate a problem with the machine.

123456789101112131415161718192021222324
import numpy as np from sklearn.cluster import KMeans # Same vibration data as before vibration_data = np.array([ [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2], [8.0], [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3], [0.5], [2.2], [2.3], [2.1], [2.4], [2.2] ]) # Fit KMeans kmeans = KMeans(n_clusters=2, random_state=42) labels = kmeans.fit_predict(vibration_data) # Compute distances to assigned cluster center distances = np.abs(vibration_data.flatten() - kmeans.cluster_centers_[labels].flatten()) # Find the indices of the farthest points (potential anomalies) anomaly_indices = distances.argsort()[-2:] # Top 2 farthest points print("Potential anomalies at indices:", anomaly_indices) print("Anomalous vibration values:", vibration_data[anomaly_indices].flatten())
copy

1. What is the purpose of clustering in engineering data analysis?

2. Which scikit-learn class is used for KMeans clustering?

3. How can clustering help identify faulty equipment?

question mark

What is the purpose of clustering in engineering data analysis?

Select the correct answer

question mark

Which scikit-learn class is used for KMeans clustering?

Select the correct answer

question mark

How can clustering help identify faulty equipment?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 2

Chieda ad AI

expand

Chieda ad AI

ChatGPT

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

bookClustering for Anomaly Detection

Scorri per mostrare il menu

Clustering is a powerful technique in engineering data science that enables you to uncover patterns and groupings within complex datasets, even when you do not know the underlying structure in advance. In engineering applications, clustering is especially useful for anomaly detection, such as identifying unusual sensor readings that could indicate faulty equipment, abnormal operating conditions, or the need for maintenance. By grouping similar data points together, clustering allows you to spot outliers—those points that do not fit well into any group—which are often the very anomalies engineers need to find.

1234567891011121314151617181920
import numpy as np from sklearn.cluster import KMeans # Simulated vibration sensor data from a machine (in mm/s) # Most readings are normal, but a few are unusually high or low vibration_data = np.array([ [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2], [8.0], # Possible anomaly (very high) [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3], [0.5], # Possible anomaly (very low) [2.2], [2.3], [2.1], [2.4], [2.2] ]) # Cluster into 2 groups (normal and abnormal) kmeans = KMeans(n_clusters=2, random_state=42) labels = kmeans.fit_predict(vibration_data) # Print cluster centers and labels print("Cluster centers:", kmeans.cluster_centers_.flatten()) print("Labels:", labels)
copy

After clustering the vibration sensor data, you can interpret the results by examining the cluster centers and the labels assigned to each data point. The cluster centers represent the typical vibration levels for each group. In this example, you should see one cluster center near the normal operating vibration (around 2.2 mm/s) and another further away, capturing the abnormal values. Data points assigned to the cluster with a center far from the norm may be considered suspicious. By reviewing which points belong to which cluster, you gain insight into which readings are typical and which may indicate a problem with the machine.

123456789101112131415161718192021222324
import numpy as np from sklearn.cluster import KMeans # Same vibration data as before vibration_data = np.array([ [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2], [8.0], [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3], [0.5], [2.2], [2.3], [2.1], [2.4], [2.2] ]) # Fit KMeans kmeans = KMeans(n_clusters=2, random_state=42) labels = kmeans.fit_predict(vibration_data) # Compute distances to assigned cluster center distances = np.abs(vibration_data.flatten() - kmeans.cluster_centers_[labels].flatten()) # Find the indices of the farthest points (potential anomalies) anomaly_indices = distances.argsort()[-2:] # Top 2 farthest points print("Potential anomalies at indices:", anomaly_indices) print("Anomalous vibration values:", vibration_data[anomaly_indices].flatten())
copy

1. What is the purpose of clustering in engineering data analysis?

2. Which scikit-learn class is used for KMeans clustering?

3. How can clustering help identify faulty equipment?

question mark

What is the purpose of clustering in engineering data analysis?

Select the correct answer

question mark

Which scikit-learn class is used for KMeans clustering?

Select the correct answer

question mark

How can clustering help identify faulty equipment?

Select the correct answer

Tutto è chiaro?

Come possiamo migliorarlo?

Grazie per i tuoi commenti!

Sezione 3. Capitolo 2
some-alt