Course Content
Data Anomaly Detection
2. Statistical Methods in Anomaly Detection
Data Anomaly Detection
1.5 IQR Rule
The 1.5 IQR (Interquartile Range) rule is a simple but effective method for identifying outliers in a dataset. It's based on the spread of data around the median and is commonly used in anomaly detection.
How to use 1.5 IQR rule
- Calculate the IQR, which is the range between the 75th percentile (Q3) and the 25th percentile (Q1) of the dataset;
- Define the lower threshold as
Q1 - 1.5 * IQR
and the upper threshold asQ3 + 1.5 * IQR
; - Any data point below the lower threshold or above the upper threshold is considered an outlier.
Here is the implementation of this rule:
We simply calculate threshold values and condenser all points out of IQR range as outliers.
1.5 IQR rule for commonly used distributions
![](https://codefinity-content-media-v2.s3.eu-west-1.amazonaws.com/courses/165dbadd-b48e-4a7f-8b0d-1b8477c22a1d/outlier_detection_plot.png)
Pros and cons of using 1.5 IQR rule
Pros | Cons |
---|---|
Simple and easy-to-understand method for identifying outliers. | May not work well with non-symmetric or heavily skewed data distributions. |
Robust to extreme values (outliers) in the dataset. | Requires choosing a fixed multiplier (e.g., 1.5) which may not be suitable for all datasets. |
Based on quartiles (Q1 and Q3) and the median, which are less affected by outliers. | Doesn't provide information about the nature or cause of outliers. |
Useful for identifying potential outliers that deviate significantly from the majority of the data. | May classify certain valid data points as outliers if they fall outside the fixed threshold. |
Can be applied to various types of data, including univariate and multivariate datasets. | Doesn't consider the underlying data distribution or model assumptions. |
Everything was clear?
Course Content
Data Anomaly Detection
2. Statistical Methods in Anomaly Detection
Data Anomaly Detection
1.5 IQR Rule
The 1.5 IQR (Interquartile Range) rule is a simple but effective method for identifying outliers in a dataset. It's based on the spread of data around the median and is commonly used in anomaly detection.
How to use 1.5 IQR rule
- Calculate the IQR, which is the range between the 75th percentile (Q3) and the 25th percentile (Q1) of the dataset;
- Define the lower threshold as
Q1 - 1.5 * IQR
and the upper threshold asQ3 + 1.5 * IQR
; - Any data point below the lower threshold or above the upper threshold is considered an outlier.
Here is the implementation of this rule:
We simply calculate threshold values and condenser all points out of IQR range as outliers.
1.5 IQR rule for commonly used distributions
![](https://codefinity-content-media-v2.s3.eu-west-1.amazonaws.com/courses/165dbadd-b48e-4a7f-8b0d-1b8477c22a1d/outlier_detection_plot.png)
Pros and cons of using 1.5 IQR rule
Pros | Cons |
---|---|
Simple and easy-to-understand method for identifying outliers. | May not work well with non-symmetric or heavily skewed data distributions. |
Robust to extreme values (outliers) in the dataset. | Requires choosing a fixed multiplier (e.g., 1.5) which may not be suitable for all datasets. |
Based on quartiles (Q1 and Q3) and the median, which are less affected by outliers. | Doesn't provide information about the nature or cause of outliers. |
Useful for identifying potential outliers that deviate significantly from the majority of the data. | May classify certain valid data points as outliers if they fall outside the fixed threshold. |
Can be applied to various types of data, including univariate and multivariate datasets. | Doesn't consider the underlying data distribution or model assumptions. |
Everything was clear?