Using IQR for Outliers
Swipe to show menu
The interquartile range (IQR) is a powerful tool for detecting outliers in a dataset. The standard rule for outlier detection using IQR is known as the 1.5*IQR rule. According to this rule, any data point that lies below the first quartile (Q1) minus 1.5 times the IQR or above the third quartile (Q3) plus 1.5 times the IQR is considered an outlier. The calculation is as follows:
- Lower bound:
Q1 - 1.5 * IQR; - Upper bound:
Q3 + 1.5 * IQR.
This method is robust because it relies on the spread of the middle 50% of the data, making it less sensitive to extreme values. Unlike methods based on the mean and standard deviation, the IQR approach is not easily influenced by unusually large or small values, which makes it especially suitable for skewed distributions or when the data contains anomalies.
123456789101112131415161718import pandas as pd # Sample dataset data = {'value': [10, 12, 12, 13, 12, 14, 2, 12, 15, 12, 30]} df = pd.DataFrame(data) # Calculate Q1, Q3, and IQR Q1 = df['value'].quantile(0.25) Q3 = df['value'].quantile(0.75) IQR = Q3 - Q1 # Calculate bounds lower_bound = Q1 - 1.5 * IQR upper_bound = Q3 + 1.5 * IQR # Identify outliers df['is_outlier'] = (df['value'] < lower_bound) | (df['value'] > upper_bound) print(df)
123456789101112import matplotlib.pyplot as plt import seaborn as sns # Create boxplot plt.figure(figsize=(6, 2)) sns.boxplot(x=df['value'], color="skyblue") # Overlay outliers outliers = df[df['is_outlier']] sns.stripplot(x=outliers['value'], color="red", size=8, marker="D", label="Outliers") plt.legend() plt.title("Boxplot with IQR-Detected Outliers Highlighted") plt.show()
The boxplot provides a clear visual summary of how the IQR method detects outliers:
- The box represents the interquartile range (IQR), showing the spread of the middle 50% of your data;
- The whiskers extend to the most extreme values that are still within the 1.5*IQR bounds;
- Outliers appear as individual points beyond the whiskers, easily identified and separated from the main data cluster.
This IQR-based approach isolates unusual values without being affected by their presence. Unlike methods that use the mean and standard deviation, the IQR method is robust even when your data is skewed or contains extreme values. This makes it a preferred choice for reliable outlier detection in real-world datasets.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat