Removing Outliers Using IQR Method
Another effective way to detect and remove outliers is by using the interquartile range (IQR) method.
What Is IQR?
The interquartile range (IQR) is a measure of statistical dispersion and is calculated as:
IQR=Q3βQ1Where:
- Q1: 25th percentile (first quartile);
- Q3: 75th percentile (third quartile).
Values lying below Q1β1.5ΓIQR or above Q3+1.5ΓIQR are typically considered outliers.
Calculating IQR
To calculate the IQR value and detect the outliers, you first need to know the 25th percentile and 75th percentile values. They can be obtained with the quantile()
function. Then, you can compute the IQR value by following the formula.
q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement
Identifying Outliers
Similar to the z-score method, you need to identify the lower and upper boundaries:
Thresh_hold <- 1.5
upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)
Then you can either select all outliers to analyze them:
df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]
Or create an outlier-free dataset:
df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 4
Removing Outliers Using IQR Method
Swipe to show menu
Another effective way to detect and remove outliers is by using the interquartile range (IQR) method.
What Is IQR?
The interquartile range (IQR) is a measure of statistical dispersion and is calculated as:
IQR=Q3βQ1Where:
- Q1: 25th percentile (first quartile);
- Q3: 75th percentile (third quartile).
Values lying below Q1β1.5ΓIQR or above Q3+1.5ΓIQR are typically considered outliers.
Calculating IQR
To calculate the IQR value and detect the outliers, you first need to know the 25th percentile and 75th percentile values. They can be obtained with the quantile()
function. Then, you can compute the IQR value by following the formula.
q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement
Identifying Outliers
Similar to the z-score method, you need to identify the lower and upper boundaries:
Thresh_hold <- 1.5
upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)
Then you can either select all outliers to analyze them:
df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]
Or create an outlier-free dataset:
df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
Thanks for your feedback!