Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Challenge: Standardize Text Case | Foundations of Data Cleaning
Practice
Projects
Quizzes & Challenges
Visat
Challenges
/
Python for Data Cleaning
Osio 1. Luku 6
single

single

bookChallenge: Standardize Text Case

Pyyhkäise näyttääksesi valikon

Consistent text formatting is essential for reliable data analysis and grouping. When text data contains a mix of uppercase, lowercase, or capitalized words, grouping and comparison operations can yield misleading results. For instance, "Apple", "apple", and "APPLE" might all refer to the same value, but without standardization, they are treated as distinct entries. By ensuring that all text values in a column use the same case, you simplify grouping and aggregation, reduce errors, and make your data easier to work with.

12345678
import pandas as pd data = { "fruit": ["Apple", "banana", "ORANGE", "apple", "Banana", "orange"], "quantity": [5, 3, 4, 2, 1, 6] } df = pd.DataFrame(data) print(df)
copy

Capitalizing Text for Consistency

Another useful approach is to convert text to capitalized case, where only the first letter of each value is uppercase and the rest are lowercase. This style is often used for names or titles. You can use the str.capitalize() method in pandas to achieve this. For example:

import pandas as pd

data = {
    "fruit": ["Apple", "banana", "ORANGE", "apple", "Banana", "orange"],
    "quantity": [5, 3, 4, 2, 1, 6]
}
df = pd.DataFrame(data)
df["fruit"] = df["fruit"].str.capitalize()
print(df)

This will output:

    fruit  quantity
0   Apple         5
1  Banana         3
2  Orange         4
3   Apple         2
4  Banana         1
5  Orange         6

Using str.capitalize() ensures that each entry starts with an uppercase letter, which can be helpful when preparing data for presentation or matching a specific format.

123456789101112
import pandas as pd data = { "fruit": ["Apple", "banana", "ORANGE", "apple", "Banana", "orange"], "quantity": [5, 3, 4, 2, 1, 6] } df = pd.DataFrame(data) # Standardize text case using str.capitalize() df_capitalized = df.copy() df_capitalized["fruit"] = df_capitalized["fruit"].str.capitalize() print(df_capitalized)
copy
Tehtävä

Swipe to start coding

Write a function that standardizes all values in a specified column of a DataFrame to lowercase. The function should return a new DataFrame with the values in the given column converted to lowercase, while leaving all other columns unchanged.

Ratkaisu

Switch to desktopVaihda työpöytään todellista harjoitusta vartenJatka siitä, missä olet käyttämällä jotakin alla olevista vaihtoehdoista
Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 1. Luku 6
single

single

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

some-alt