Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Removing Duplicates | Data Cleaning
Preprocessing Data

book
Removing Duplicates

To remove the duplicate rows, simply use function drop_duplicates(). To change the current dataframe, add inplace=True.

new_data = data.drop_duplicates() # data is not modified
# or
data.drop_duplicates(inplace=True) # data is modified
123
new_data = data.drop_duplicates() # data is not modified # or data.drop_duplicates(inplace=True) # data is modified
copy
Tarefa

Swipe to start coding

The planets dataset is given to you. Remove the duplicates and then check the new shape of dataframe. Compare it with the original shape.

Note that dataframe may have only distinct records, in this case, the shape will remain the same.

Solução

import pandas as pd
import numpy as np

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/planets.csv')

print(data.shape)
data.drop_duplicates(inplace=True)
print(data.shape)

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 7
import pandas as pd
import numpy as np

data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/10db3746-c8ff-4c55-9ac3-4affa0b65c16/planets.csv')

Pergunte à IA

expand
ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

some-alt