Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Calculating IQR | Section
Statistics for Data Analysis

bookCalculating IQR

Swipe to show menu

Calculating the interquartile range (IQR) is a fundamental step in understanding the variability of real-world data. The IQR measures the spread of the middle 50% of values in a dataset, making it a robust statistic for exploratory data analysis. The IQR is defined mathematically as:

IQR=Q3โˆ’Q1\text{IQR} = Q_3 - Q_1

where Q1Q_1 is the first quartile (25th percentile) and Q3Q_3 is the third quartile (75th percentile). By computing the IQR for each feature in your dataset, you can quickly identify which variables have higher variability and which are more tightly clustered. This insight is crucial for detecting outliers, comparing distributions, and making informed decisions about data preprocessing.

12345678910111213141516171819202122232425
import pandas as pd # Load a sample dataset data = { 'age': [23, 45, 31, 35, 40, 29, 48, 34, 37, 42], 'income': [50000, 80000, 62000, 76000, 54000, 70000, 90000, 65000, 71000, 85000], 'score': [88, 92, 85, 90, 87, 91, 95, 89, 86, 93] } df = pd.DataFrame(data) # Compute Q1 and Q3 for each column q1 = df.quantile(0.25) q3 = df.quantile(0.75) # Compute the IQR for each column iqr = q3 - q1 # Summarize results in a DataFrame iqr_summary = pd.DataFrame({ 'Q1': q1, 'Q3': q3, 'IQR': iqr }) print(iqr_summary)
copy

The output DataFrame displays the first quartile (Q1Q_1), third quartile (Q3Q_3), and IQR for each feature. Higher IQR values indicate greater variability in the middle 50% of the data for that column, while lower IQR values suggest that most values are clustered closely together. For instance, if the income\text{income} column shows a much larger IQR than score\text{score}, it means that incomes vary more widely among the subjects than their scores do. By comparing IQRs across features, you gain a clearer picture of which variables are more dispersed and which are more consistent, helping you focus your analysis on the most variable or stable aspects of your data.

question mark

Which of the following statements best describes the meaning of a high IQR value for a dataset column?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Sectionย 1. Chapterย 32

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Sectionย 1. Chapterย 32
some-alt