Kursinnehåll
Data Anomaly Detection
Data Anomaly Detection
2. Statistical Methods in Anomaly Detection
Median Absolute Deviation
The MAD (Median Absolute Deviation) rule is a statistical outlier detection method that uses the median and the median absolute deviation as robust estimators to identify outliers in a dataset.
It is particularly useful when dealing with data that may not follow a normal distribution or when there are potential outliers that can significantly impact the mean and standard deviation.
How to use MAD rule
- Calculate the Median: Compute the median of the dataset, which is the middle value when the data is sorted;
- Calculate the Median Absolute Deviation (MAD): For each data point, find the absolute difference between the data point and the median. The MAD is the median of these absolute differences;
- Define a Threshold: Choose a threshold value (usually a constant, e.g., 2 or 3 times the MAD) to determine how far a data point can deviate from the median before being considered an outlier;
- Identify Outliers: Any data point that has an absolute difference from the median greater than the threshold is considered an outlier.
Note
Mathematically, the absolute difference between two values,
A
andB
, is denoted as|A - B|
, where"|"
represents the absolute value function. This function returns the positive value of the difference between A and B.
MAD rule implementation
def mad_rule_outlier_detection(data, threshold=3.0):
# Calculate the median
median = np.median(data)
# Calculate the absolute differences from the median
abs_diff = np.abs(data - median)
# Calculate the MAD (Median Absolute Deviation)
mad = np.median(abs_diff)
# Define the threshold for outliers
outlier_threshold = threshold * mad
# Identify outliers based on the threshold
outliers = [x for x in data if np.abs(x - median) > outlier_threshold]
return outliers
MAD vs 1.5 IQR rule
Var allt tydligt?
Tack för dina kommentarer!
Avsnitt 2. Kapitel 5