Median Absolute Deviation
The MAD (Median Absolute Deviation) rule is a statistical outlier detection method that uses the median and the median absolute deviation as robust estimators to identify outliers in a dataset.
It is particularly useful when dealing with data that may not follow a normal distribution or when there are potential outliers that can significantly impact the mean and standard deviation.
How to use MAD rule
- Calculate the Median: Compute the median of the dataset, which is the middle value when the data is sorted;
- Calculate the Median Absolute Deviation (MAD): For each data point, find the absolute difference between the data point and the median. The MAD is the median of these absolute differences;
- Define a Threshold: Choose a threshold value (usually a constant, e.g., 2 or 3 times the MAD) to determine how far a data point can deviate from the median before being considered an outlier;
- Identify Outliers: Any data point that has an absolute difference from the median greater than the threshold is considered an outlier.
Note
Mathematically, the absolute difference between two values,
A
andB
, is denoted as|A - B|
, where"|"
represents the absolute value function. This function returns the positive value of the difference between A and B.
MAD rule implementation
def mad_rule_outlier_detection(data, threshold=3.0):# Calculate the medianmedian = np.median(data)# Calculate the absolute differences from the medianabs_diff = np.abs(data - median)# Calculate the MAD (Median Absolute Deviation)mad = np.median(abs_diff)# Define the threshold for outliersoutlier_threshold = threshold * mad# Identify outliers based on the thresholdoutliers = [x for x in data if np.abs(x - median) > outlier_threshold]return outliers
MAD vs 1.5 IQR rule
Дякуємо за ваш відгук!