Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Median Absolute Deviation | Statistical Methods in Anomaly Detection
Data Anomaly Detection

book
Median Absolute Deviation

The MAD (Median Absolute Deviation) rule is a statistical outlier detection method that uses the median and the median absolute deviation as robust estimators to identify outliers in a dataset.

It is particularly useful when dealing with data that may not follow a normal distribution or when there are potential outliers that can significantly impact the mean and standard deviation.

How to use MAD rule

  1. Calculate the Median: Compute the median of the dataset, which is the middle value when the data is sorted;
  2. Calculate the Median Absolute Deviation (MAD): For each data point, find the absolute difference between the data point and the median. The MAD is the median of these absolute differences;
  3. Define a Threshold: Choose a threshold value (usually a constant, e.g., 2 or 3 times the MAD) to determine how far a data point can deviate from the median before being considered an outlier;
  4. Identify Outliers: Any data point that has an absolute difference from the median greater than the threshold is considered an outlier.

Note

Mathematically, the absolute difference between two values, A and B, is denoted as |A - B|, where "|" represents the absolute value function. This function returns the positive value of the difference between A and B.

MAD rule implementation

def mad_rule_outlier_detection(data, threshold=3.0):
# Calculate the median
median = np.median(data)
# Calculate the absolute differences from the median
abs_diff = np.abs(data - median)
# Calculate the MAD (Median Absolute Deviation)
mad = np.median(abs_diff)
# Define the threshold for outliers
outlier_threshold = threshold * mad
# Identify outliers based on the threshold
outliers = [x for x in data if np.abs(x - median) > outlier_threshold]
return outliers

MAD vs 1.5 IQR rule

What is the main advantage of using MAD for outlier detection?

What is the main advantage of using MAD for outlier detection?

Виберіть правильну відповідь

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 5
some-alt