Challenge
Now it's your turn to remove rows with wrong data!
Task
Swipe to start coding
- Get indexes of rows having 'inconsistent' data. Follow the next steps:
-
Apply the
.loc[]
property to thedf
dataframe. -
Set condition on rows: row sums of 2-14 columns (
[2:15]
) must be not equal to values in the'hhsize'
column. -
Use the
.index
attribute.
- Drop rows using indexes saved within
ind
variable. Set theinplace
parameter to rewrite the changes.
Solution
9
1
2
3
4
5
6
7
8
9
# Importing the library
import pandas as pd
# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Select rows with discrepancies
ind = df.loc[(df.iloc[:,2:15].sum(axis = 1) != df.hhsize)].index
# Drop chosen rows
df.drop(index = ind, inplace = True)
Everything was clear?
Thanks for your feedback!
Section 2. Chapter 3
9
1
2
3
4
5
6
7
8
9
# Importing the library
import pandas as pd
# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Select rows with discrepancies
ind = df.___[(df.___[:,___].sum(axis = ___) != df.___)].___
# Drop chosen rows
df.___(___ = ___, inplace = ___)
Ask AI
Ask anything or try one of the suggested questions to begin our chat