Preparing Clean Data
Swipe to show menu
Real-world datasets often contain inconsistent formatting, extra spaces, and messy text values. In this chapter, clean and standardize customer data before using it in lookups and reporting.
Download the customer dataset and import it into Excel before starting the exercises.
TRIM and PROPER
=PROPER(TRIM(Customers!B2))
TRIM(): removes extra spaces;PROPER(): fixes capitalization.
Case Functions
=UPPER(TRIM(Customers!B2))
=LOWER(TRIM(Customers!B2))
UPPER(): converts text to uppercase;LOWER(): converts text to lowercase.
SUBSTITUTE Function
=SUBSTITUTE(Customers!C2, " ", "")
" ": character being replaced;"": replacement value;- Removes all spaces from the email value.
Text Extraction Functions
=RIGHT(A2, 3)
=LEFT(A2, 2)
=MID(A2, 2, 3)
RIGHT(): extracts text from the end;LEFT(): extracts text from the beginning;MID(): extracts text from the middle.
TEXT Function
=TEXT(Customers!D2, "MMM YYYY")
Converts:
15/03/2023 → Mar 2023
Go to:
Data → Get Data → From Text/CSV
Preview the dataset before loading it into Excel.
Use:
Use First Row as Headers
Load the dataset into a new worksheet.
Create a new worksheet named:
Customers_Clean
Add the following columns:
Customer_ID
Full_Name_Clean
Inside the first Customer_ID cell, type:
=Customers!A2
Fill the formula down the column.
Inside the first Full_Name_Clean cell, type:
=PROPER(TRIM(Customers!B2))
Fill the formula down the column.
Inside a new column, type:
=SUBSTITUTE(Customers!C2, " ", "")
This removes invalid spaces from email addresses.
Right-side extraction:
=RIGHT(A2, 3)
Middle extraction:
=MID(A2, 2, 3)
Type:
=TEXT(Customers!D2, "MMM YYYY")
This standardizes the reporting format for dates.
1. Why is TRIM alone not enough to clean email addresses?
2. What does this formula do?
3. What is the purpose of the TEXT function in this lesson?
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat