Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge | Preprocessing Data: Part II
Data Manipulation using pandas

book
Challenge

Now it's your turn to remove rows with wrong data!

Task

Swipe to start coding

  1. Get indexes of rows having 'inconsistent' data. Follow the next steps:
  • Apply the .loc[] property to the df dataframe.

  • Set condition on rows: row sums of 2-14 columns ([2:15]) must be not equal to values in the 'hhsize' column.

  • Use the .index attribute.

  1. Drop rows using indexes saved within ind variable. Set the inplace parameter to rewrite the changes.

Solution

# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Select rows with discrepancies
ind = df.loc[(df.iloc[:,2:15].sum(axis = 1) != df.hhsize)].index
# Drop chosen rows
df.drop(index = ind, inplace = True)

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 3
# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Select rows with discrepancies
ind = df.___[(df.___[:,___].sum(axis = ___) != df.___)].___
# Drop chosen rows
df.___(___ = ___, inplace = ___)

Ask AI

expand
ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt