Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Removing Outliers Using IQR Method | Basic Statistical Analysis
Data Analysis with R

bookRemoving Outliers Using IQR Method

Another effective way to detect and remove outliers is by using the interquartile range (IQR) method.

What Is IQR?

The interquartile range (IQR) is a measure of statistical dispersion and is calculated as:

IQR=Q3βˆ’Q1IQR = Q3βˆ’Q1

Where:

  • Q1Q1: 25th percentile (first quartile);
  • Q3Q3: 75th percentile (third quartile).

Values lying below Q1βˆ’1.5Γ—IQRQ1 βˆ’ 1.5 \times IQR or above Q3+1.5Γ—IQRQ3 + 1.5 \times IQR are typically considered outliers.

Calculating IQR

To calculate the IQR value and detect the outliers, you first need to know the 25th percentile and 75th percentile values. They can be obtained with the quantile() function. Then, you can compute the IQR value by following the formula.

q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement

Identifying Outliers

Similar to the z-score method, you need to identify the lower and upper boundaries:

Thresh_hold <- 1.5
upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)

Then you can either select all outliers to analyze them:

df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]

Or create an outlier-free dataset:

df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
question mark

What does IQR stand for?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 4

bookRemoving Outliers Using IQR Method

Swipe to show menu

Another effective way to detect and remove outliers is by using the interquartile range (IQR) method.

What Is IQR?

The interquartile range (IQR) is a measure of statistical dispersion and is calculated as:

IQR=Q3βˆ’Q1IQR = Q3βˆ’Q1

Where:

  • Q1Q1: 25th percentile (first quartile);
  • Q3Q3: 75th percentile (third quartile).

Values lying below Q1βˆ’1.5Γ—IQRQ1 βˆ’ 1.5 \times IQR or above Q3+1.5Γ—IQRQ3 + 1.5 \times IQR are typically considered outliers.

Calculating IQR

To calculate the IQR value and detect the outliers, you first need to know the 25th percentile and 75th percentile values. They can be obtained with the quantile() function. Then, you can compute the IQR value by following the formula.

q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement

Identifying Outliers

Similar to the z-score method, you need to identify the lower and upper boundaries:

Thresh_hold <- 1.5
upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)

Then you can either select all outliers to analyze them:

df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]

Or create an outlier-free dataset:

df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
question mark

What does IQR stand for?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
some-alt