Clustering for Anomaly Detection
Clustering is a powerful technique in engineering data science that enables you to uncover patterns and groupings within complex datasets, even when you do not know the underlying structure in advance. In engineering applications, clustering is especially useful for anomaly detection, such as identifying unusual sensor readings that could indicate faulty equipment, abnormal operating conditions, or the need for maintenance. By grouping similar data points together, clustering allows you to spot outliers—those points that do not fit well into any group—which are often the very anomalies engineers need to find.
1234567891011121314151617181920import numpy as np from sklearn.cluster import KMeans # Simulated vibration sensor data from a machine (in mm/s) # Most readings are normal, but a few are unusually high or low vibration_data = np.array([ [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2], [8.0], # Possible anomaly (very high) [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3], [0.5], # Possible anomaly (very low) [2.2], [2.3], [2.1], [2.4], [2.2] ]) # Cluster into 2 groups (normal and abnormal) kmeans = KMeans(n_clusters=2, random_state=42) labels = kmeans.fit_predict(vibration_data) # Print cluster centers and labels print("Cluster centers:", kmeans.cluster_centers_.flatten()) print("Labels:", labels)
After clustering the vibration sensor data, you can interpret the results by examining the cluster centers and the labels assigned to each data point. The cluster centers represent the typical vibration levels for each group. In this example, you should see one cluster center near the normal operating vibration (around 2.2 mm/s) and another further away, capturing the abnormal values. Data points assigned to the cluster with a center far from the norm may be considered suspicious. By reviewing which points belong to which cluster, you gain insight into which readings are typical and which may indicate a problem with the machine.
123456789101112131415161718192021222324import numpy as np from sklearn.cluster import KMeans # Same vibration data as before vibration_data = np.array([ [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2], [8.0], [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3], [0.5], [2.2], [2.3], [2.1], [2.4], [2.2] ]) # Fit KMeans kmeans = KMeans(n_clusters=2, random_state=42) labels = kmeans.fit_predict(vibration_data) # Compute distances to assigned cluster center distances = np.abs(vibration_data.flatten() - kmeans.cluster_centers_[labels].flatten()) # Find the indices of the farthest points (potential anomalies) anomaly_indices = distances.argsort()[-2:] # Top 2 farthest points print("Potential anomalies at indices:", anomaly_indices) print("Anomalous vibration values:", vibration_data[anomaly_indices].flatten())
1. What is the purpose of clustering in engineering data analysis?
2. Which scikit-learn class is used for KMeans clustering?
3. How can clustering help identify faulty equipment?
Kiitos palautteestasi!
Kysy tekoälyä
Kysy tekoälyä
Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme
Mahtavaa!
Completion arvosana parantunut arvoon 4.76
Clustering for Anomaly Detection
Pyyhkäise näyttääksesi valikon
Clustering is a powerful technique in engineering data science that enables you to uncover patterns and groupings within complex datasets, even when you do not know the underlying structure in advance. In engineering applications, clustering is especially useful for anomaly detection, such as identifying unusual sensor readings that could indicate faulty equipment, abnormal operating conditions, or the need for maintenance. By grouping similar data points together, clustering allows you to spot outliers—those points that do not fit well into any group—which are often the very anomalies engineers need to find.
1234567891011121314151617181920import numpy as np from sklearn.cluster import KMeans # Simulated vibration sensor data from a machine (in mm/s) # Most readings are normal, but a few are unusually high or low vibration_data = np.array([ [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2], [8.0], # Possible anomaly (very high) [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3], [0.5], # Possible anomaly (very low) [2.2], [2.3], [2.1], [2.4], [2.2] ]) # Cluster into 2 groups (normal and abnormal) kmeans = KMeans(n_clusters=2, random_state=42) labels = kmeans.fit_predict(vibration_data) # Print cluster centers and labels print("Cluster centers:", kmeans.cluster_centers_.flatten()) print("Labels:", labels)
After clustering the vibration sensor data, you can interpret the results by examining the cluster centers and the labels assigned to each data point. The cluster centers represent the typical vibration levels for each group. In this example, you should see one cluster center near the normal operating vibration (around 2.2 mm/s) and another further away, capturing the abnormal values. Data points assigned to the cluster with a center far from the norm may be considered suspicious. By reviewing which points belong to which cluster, you gain insight into which readings are typical and which may indicate a problem with the machine.
123456789101112131415161718192021222324import numpy as np from sklearn.cluster import KMeans # Same vibration data as before vibration_data = np.array([ [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2], [8.0], [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3], [0.5], [2.2], [2.3], [2.1], [2.4], [2.2] ]) # Fit KMeans kmeans = KMeans(n_clusters=2, random_state=42) labels = kmeans.fit_predict(vibration_data) # Compute distances to assigned cluster center distances = np.abs(vibration_data.flatten() - kmeans.cluster_centers_[labels].flatten()) # Find the indices of the farthest points (potential anomalies) anomaly_indices = distances.argsort()[-2:] # Top 2 farthest points print("Potential anomalies at indices:", anomaly_indices) print("Anomalous vibration values:", vibration_data[anomaly_indices].flatten())
1. What is the purpose of clustering in engineering data analysis?
2. Which scikit-learn class is used for KMeans clustering?
3. How can clustering help identify faulty equipment?
Kiitos palautteestasi!