Oppiskele Clustering for Anomaly Detection | Engineering Data Science Applications

Python for Engineers

Pyyhkäise näyttääksesi valikon

Clustering is a powerful technique in engineering data science that enables you to uncover patterns and groupings within complex datasets, even when you do not know the underlying structure in advance. In engineering applications, clustering is especially useful for anomaly detection, such as identifying unusual sensor readings that could indicate faulty equipment, abnormal operating conditions, or the need for maintenance. By grouping similar data points together, clustering allows you to spot outliers—those points that do not fit well into any group—which are often the very anomalies engineers need to find.


              1234567891011121314151617181920
            
import numpy as np
from sklearn.cluster import KMeans

# Simulated vibration sensor data from a machine (in mm/s)
# Most readings are normal, but a few are unusually high or low
vibration_data = np.array([
    [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2],
    [8.0],  # Possible anomaly (very high)
    [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3],
    [0.5],  # Possible anomaly (very low)
    [2.2], [2.3], [2.1], [2.4], [2.2]
])

# Cluster into 2 groups (normal and abnormal)
kmeans = KMeans(n_clusters=2, random_state=42)
labels = kmeans.fit_predict(vibration_data)

# Print cluster centers and labels
print("Cluster centers:", kmeans.cluster_centers_.flatten())
print("Labels:", labels)

After clustering the vibration sensor data, you can interpret the results by examining the cluster centers and the labels assigned to each data point. The cluster centers represent the typical vibration levels for each group. In this example, you should see one cluster center near the normal operating vibration (around 2.2 mm/s) and another further away, capturing the abnormal values. Data points assigned to the cluster with a center far from the norm may be considered suspicious. By reviewing which points belong to which cluster, you gain insight into which readings are typical and which may indicate a problem with the machine.


              123456789101112131415161718192021222324
            
import numpy as np
from sklearn.cluster import KMeans

# Same vibration data as before
vibration_data = np.array([
    [2.1], [2.3], [2.2], [2.0], [2.4], [2.3], [2.2], [2.5], [2.1], [2.2],
    [8.0],
    [1.9], [2.0], [2.1], [2.2], [2.3], [2.2], [2.1], [2.4], [2.3],
    [0.5],
    [2.2], [2.3], [2.1], [2.4], [2.2]
])

# Fit KMeans
kmeans = KMeans(n_clusters=2, random_state=42)
labels = kmeans.fit_predict(vibration_data)

# Compute distances to assigned cluster center
distances = np.abs(vibration_data.flatten() - kmeans.cluster_centers_[labels].flatten())

# Find the indices of the farthest points (potential anomalies)
anomaly_indices = distances.argsort()[-2:]  # Top 2 farthest points

print("Potential anomalies at indices:", anomaly_indices)
print("Anomalous vibration values:", vibration_data[anomaly_indices].flatten())

Oliko kaikki selvää?

Kiitos palautteestasi!

Osio 3. Luku 2

Kysy tekoälyä

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 3. Luku 2

Clustering for Anomaly Detection

1. What is the purpose of clustering in engineering data analysis?

2. Which scikit-learn class is used for KMeans clustering?

3. How can clustering help identify faulty equipment?