Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Using IQR for Outliers | Section
Statistics for Data Analysis

bookUsing IQR for Outliers

Svep för att visa menyn

The interquartile range (IQR) is a powerful tool for detecting outliers in a dataset. The standard rule for outlier detection using IQR is known as the 1.5*IQR rule. According to this rule, any data point that lies below the first quartile (Q1) minus 1.5 times the IQR or above the third quartile (Q3) plus 1.5 times the IQR is considered an outlier. The calculation is as follows:

  • Lower bound: Q1 - 1.5 * IQR;
  • Upper bound: Q3 + 1.5 * IQR.

This method is robust because it relies on the spread of the middle 50% of the data, making it less sensitive to extreme values. Unlike methods based on the mean and standard deviation, the IQR approach is not easily influenced by unusually large or small values, which makes it especially suitable for skewed distributions or when the data contains anomalies.

123456789101112131415161718
import pandas as pd # Sample dataset data = {'value': [10, 12, 12, 13, 12, 14, 2, 12, 15, 12, 30]} df = pd.DataFrame(data) # Calculate Q1, Q3, and IQR Q1 = df['value'].quantile(0.25) Q3 = df['value'].quantile(0.75) IQR = Q3 - Q1 # Calculate bounds lower_bound = Q1 - 1.5 * IQR upper_bound = Q3 + 1.5 * IQR # Identify outliers df['is_outlier'] = (df['value'] < lower_bound) | (df['value'] > upper_bound) print(df)
copy
123456789101112
import matplotlib.pyplot as plt import seaborn as sns # Create boxplot plt.figure(figsize=(6, 2)) sns.boxplot(x=df['value'], color="skyblue") # Overlay outliers outliers = df[df['is_outlier']] sns.stripplot(x=outliers['value'], color="red", size=8, marker="D", label="Outliers") plt.legend() plt.title("Boxplot with IQR-Detected Outliers Highlighted") plt.show()
copy

The boxplot provides a clear visual summary of how the IQR method detects outliers:

  • The box represents the interquartile range (IQR), showing the spread of the middle 50% of your data;
  • The whiskers extend to the most extreme values that are still within the 1.5*IQR bounds;
  • Outliers appear as individual points beyond the whiskers, easily identified and separated from the main data cluster.

This IQR-based approach isolates unusual values without being affected by their presence. Unlike methods that use the mean and standard deviation, the IQR method is robust even when your data is skewed or contains extreme values. This makes it a preferred choice for reliable outlier detection in real-world datasets.

question mark

What does the 1.5*IQR rule help to identify in a dataset?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 33

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Avsnitt 1. Kapitel 33
some-alt