Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Identifying Outliers and Errors | Removing Duplicates and Errors
Clean Data in Excel

bookIdentifying Outliers and Errors

Swipe to show menu

After removing duplicates, the next important step is to check your data for outliers and errors.

Outliers are values that significantly stand out from the rest of the data. They don't always mean something is wrong, but very often they indicate a problem, such as incorrect data entry or an issue during import.

Errors, on the other hand, are values that clearly don't make sense in the context of your data. For example, an unusually large number, a negative value where it shouldn't exist, or a total that doesn't match the quantity.

Let's look at a simple example:

Product

Quantity

Total

Phone

2

800

Phone

10

4000

Laptop

1

9999

At first glance, everything looks fine, but if you look closer, some values raise questions.

The quantity of 10 might be valid, but it depends on the context. However, the total value of 9999 clearly stands out compared to the rest and looks suspicious.

If you don't check for outliers and errors, they can significantly distort your analysis. For example, one incorrect value like 9999 can skew averages or totals and lead to wrong conclusions.

How to Identify Outliers

Sorting is also very helpful. If you sort a column in ascending or descending order, the extreme values will appear at the top or bottom, making them easy to review.

A simple but effective approach is to ask yourself: "Does this value look realistic?"

If most values fall within a certain range, such as 500 to 1000, and one value is 9999, that's a clear signal that something may be wrong and needs to be checked.

It's important to understand that not every outlier is an error. Sometimes it's a real value, but it should always be verified.

Review the data and identify values that look suspicious.

Determine:

  • Which values fall outside the expected range;
  • Which values may be errors.

First, in the Quantity column, the value 10 looks unusual because most other values are 1 or 2. It may not necessarily be an error, but it definitely requires verification.

Second, in the Total column, the value 9999 clearly stands out from the overall range, since most values fall roughly between 500 and 4000. This is a strong candidate for an error.

question mark

What is an outlier in a dataset?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 4. Chapter 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Section 4. Chapter 2
some-alt