Median Absolute Deviation
The MAD (Median Absolute Deviation) rule is a statistical outlier detection method that uses the median and the median absolute deviation as robust estimators to identify outliers in a dataset.
It is particularly useful when dealing with data that may not follow a normal distribution or when there are potential outliers that can significantly impact the mean and standard deviation.
How to use MAD rule
- Calculate the Median: Compute the median of the dataset, which is the middle value when the data is sorted;
- Calculate the Median Absolute Deviation (MAD): For each data point, find the absolute difference between the data point and the median. The MAD is the median of these absolute differences;
- Define a Threshold: Choose a threshold value (usually a constant, e.g., 2 or 3 times the MAD) to determine how far a data point can deviate from the median before being considered an outlier;
- Identify Outliers: Any data point that has an absolute difference from the median greater than the threshold is considered an outlier.
Note
Mathematically, the absolute difference between two values,
A
andB
, is denoted as|A - B|
, where"|"
represents the absolute value function. This function returns the positive value of the difference between A and B.
MAD rule implementation
def mad_rule_outlier_detection(data, threshold=3.0):
# Calculate the median
median = np.median(data)
# Calculate the absolute differences from the median
abs_diff = np.abs(data - median)
# Calculate the MAD (Median Absolute Deviation)
mad = np.median(abs_diff)
# Define the threshold for outliers
outlier_threshold = threshold * mad
# Identify outliers based on the threshold
outliers = [x for x in data if np.abs(x - median) > outlier_threshold]
return outliers
MAD vs 1.5 IQR rule
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 6.67
Median Absolute Deviation
Swipe to show menu
The MAD (Median Absolute Deviation) rule is a statistical outlier detection method that uses the median and the median absolute deviation as robust estimators to identify outliers in a dataset.
It is particularly useful when dealing with data that may not follow a normal distribution or when there are potential outliers that can significantly impact the mean and standard deviation.
How to use MAD rule
- Calculate the Median: Compute the median of the dataset, which is the middle value when the data is sorted;
- Calculate the Median Absolute Deviation (MAD): For each data point, find the absolute difference between the data point and the median. The MAD is the median of these absolute differences;
- Define a Threshold: Choose a threshold value (usually a constant, e.g., 2 or 3 times the MAD) to determine how far a data point can deviate from the median before being considered an outlier;
- Identify Outliers: Any data point that has an absolute difference from the median greater than the threshold is considered an outlier.
Note
Mathematically, the absolute difference between two values,
A
andB
, is denoted as|A - B|
, where"|"
represents the absolute value function. This function returns the positive value of the difference between A and B.
MAD rule implementation
def mad_rule_outlier_detection(data, threshold=3.0):
# Calculate the median
median = np.median(data)
# Calculate the absolute differences from the median
abs_diff = np.abs(data - median)
# Calculate the MAD (Median Absolute Deviation)
mad = np.median(abs_diff)
# Define the threshold for outliers
outlier_threshold = threshold * mad
# Identify outliers based on the threshold
outliers = [x for x in data if np.abs(x - median) > outlier_threshold]
return outliers
MAD vs 1.5 IQR rule
Thanks for your feedback!