Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Median Absolute Deviation | Statistical Methods in Anomaly Detection
Data Anomaly Detection

bookMedian Absolute Deviation

The MAD (Median Absolute Deviation) rule is a statistical outlier detection method that uses the median and the median absolute deviation as robust estimators to identify outliers in a dataset.

It is particularly useful when dealing with data that may not follow a normal distribution or when there are potential outliers that can significantly impact the mean and standard deviation.

How to use MAD rule

  1. Calculate the Median: Compute the median of the dataset, which is the middle value when the data is sorted;
  2. Calculate the Median Absolute Deviation (MAD): For each data point, find the absolute difference between the data point and the median. The MAD is the median of these absolute differences;
  3. Define a Threshold: Choose a threshold value (usually a constant, e.g., 2 or 3 times the MAD) to determine how far a data point can deviate from the median before being considered an outlier;
  4. Identify Outliers: Any data point that has an absolute difference from the median greater than the threshold is considered an outlier.

Note

Mathematically, the absolute difference between two values, A and B, is denoted as |A - B|, where "|" represents the absolute value function. This function returns the positive value of the difference between A and B.

MAD rule implementation

def mad_rule_outlier_detection(data, threshold=3.0):
  
    # Calculate the median
    median = np.median(data)
    
    # Calculate the absolute differences from the median
    abs_diff = np.abs(data - median)
    
    # Calculate the MAD (Median Absolute Deviation)
    mad = np.median(abs_diff)
    
    # Define the threshold for outliers
    outlier_threshold = threshold * mad
    
    # Identify outliers based on the threshold
    outliers = [x for x in data if np.abs(x - median) > outlier_threshold]
    
    return outliers

MAD vs 1.5 IQR rule

question mark

What is the main advantage of using MAD for outlier detection?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 5

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Awesome!

Completion rate improved to 6.67

bookMedian Absolute Deviation

Deslize para mostrar o menu

The MAD (Median Absolute Deviation) rule is a statistical outlier detection method that uses the median and the median absolute deviation as robust estimators to identify outliers in a dataset.

It is particularly useful when dealing with data that may not follow a normal distribution or when there are potential outliers that can significantly impact the mean and standard deviation.

How to use MAD rule

  1. Calculate the Median: Compute the median of the dataset, which is the middle value when the data is sorted;
  2. Calculate the Median Absolute Deviation (MAD): For each data point, find the absolute difference between the data point and the median. The MAD is the median of these absolute differences;
  3. Define a Threshold: Choose a threshold value (usually a constant, e.g., 2 or 3 times the MAD) to determine how far a data point can deviate from the median before being considered an outlier;
  4. Identify Outliers: Any data point that has an absolute difference from the median greater than the threshold is considered an outlier.

Note

Mathematically, the absolute difference between two values, A and B, is denoted as |A - B|, where "|" represents the absolute value function. This function returns the positive value of the difference between A and B.

MAD rule implementation

def mad_rule_outlier_detection(data, threshold=3.0):
  
    # Calculate the median
    median = np.median(data)
    
    # Calculate the absolute differences from the median
    abs_diff = np.abs(data - median)
    
    # Calculate the MAD (Median Absolute Deviation)
    mad = np.median(abs_diff)
    
    # Define the threshold for outliers
    outlier_threshold = threshold * mad
    
    # Identify outliers based on the threshold
    outliers = [x for x in data if np.abs(x - median) > outlier_threshold]
    
    return outliers

MAD vs 1.5 IQR rule

question mark

What is the main advantage of using MAD for outlier detection?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 5
some-alt