Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Challenge: Rule-based Approach | Statistical Methods in Anomaly Detection
Data Anomaly Detection

book
Challenge: Rule-based Approach

Task
test

Swipe to show code editor

Your task is to create a function that identifies outliers based on the Euclidean distance between each data point and the mean value of the dataset:

  1. Calculate the Euclidean distance for each data point in the dataset.
  2. If the calculated distance of a data point falls outside a predefined range, classify it as an outlier.
  3. Create a list to store the identified outliers and print the list.

Once you've completed this task, click the button below the code to check your solution.

Solution

import numpy as np

# Generate synthetic data with some outliers
np.random.seed(42)
data = np.concatenate([np.random.normal(0, 1, 100), np.random.normal(5, 1, 10)])

# Define a function to detect outliers based on Euclidean distance
def euclidean_distance_anomaly_detection(data, threshold):
mean = np.mean(data)
anomalies = []
for i, value in enumerate(data):
euclidean_dist = np.sqrt((value - mean) ** 2)
if euclidean_dist > threshold:
anomalies.append((i, value, euclidean_dist))
return anomalies

# Set the anomaly detection threshold
threshold = 3

# Detect outliers in the dataset based on Euclidean distance
anomalies = euclidean_distance_anomaly_detection(data, threshold)

# Print the detected outliers and their Euclidean distances
print("Detected outliers:")
for index, value, euclidean_dist in anomalies:
print(f"Index {index}: Value {value}, Euclidean Distance {euclidean_dist}")

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 2
import numpy as np

# Generate synthetic data with some outliers
np.random.seed(42)
data = np.concatenate([np.random.normal(0, 1, 100), np.random.normal(5, 1, 10)])

# Define a function to detect outliers based on Euclidean distance
def euclidean_distance_anomaly_detection(data, threshold):
mean = np.mean(data)
anomalies = []
for i, value in enumerate(data):
euclidean_dist = np.___((value - mean) ___ 2)
if euclidean_dist ___ threshold:
anomalies.___((i, value, euclidean_dist))
return anomalies

# Set the anomaly detection threshold
threshold = 3

# Detect outliers in the dataset based on Euclidean distance
anomalies = euclidean_distance_anomaly_detection(data, threshold)

# Print the detected outliers and their Euclidean distances
print("Detected outliers:")
for index, value, euclidean_dist in anomalies:
print(f"Index {index}: Value {value}, Euclidean Distance {euclidean_dist}")
toggle bottom row
some-alt