Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Challenge | Preprocessing Data: Part II
Data Manipulation using pandas

book
Challenge

Now it's your turn to remove rows with wrong data!

Tarefa

Swipe to start coding

  1. Get indexes of rows having 'inconsistent' data. Follow the next steps:
  • Apply the .loc[] property to the df dataframe.

  • Set condition on rows: row sums of 2-14 columns ([2:15]) must be not equal to values in the 'hhsize' column.

  • Use the .index attribute.

  1. Drop rows using indexes saved within ind variable. Set the inplace parameter to rewrite the changes.

Solução

# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Select rows with discrepancies
ind = df.loc[(df.iloc[:,2:15].sum(axis = 1) != df.hhsize)].index
# Drop chosen rows
df.drop(index = ind, inplace = True)

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 3
# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data2.csv')
# Select rows with discrepancies
ind = df.___[(df.___[:,___].sum(axis = ___) != df.___)].___
# Drop chosen rows
df.___(___ = ___, inplace = ___)

Pergunte à IA

expand
ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

some-alt