Conteúdo do Curso
Data Manipulation using pandas
Data Manipulation using pandas
Types consistency
One of the first steps of analyzing received data is checking the values types. If we are talking about column with age, then we expect to have there integer type; or column with salaries should be either integer or float.
Remember, to get the columns types in pandas
, you should use the .dtypes
attribute. Execute the code below to find out values types.
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') # Output values types print(df.dtypes)
So many types... How not to get confused here? Let's see what columns have object
type. To do it, we are going to use the same attribute and within square brackets set the condition. Since we received a Series
object, column names will be indexes of this Series
, so for convenient output we will output indexes only.
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data.csv') # Output only columns with 'object' type print(df.dtypes[df.dtypes == object].index)
Here is the first problem. Columns 'totinch'
, 'morgh'
, 'valueh'
, 'grosrth'
, 'omphtotinch'
should be considered as numerical, taking into account their specifics.
Column | Description |
---|---|
'TOTINCH' | Total Household Income |
'MORGH' | Presence of Mortgage |
'VALUEH' | Value of Dwelling |
'GROSRTH' | Monthly Gross Rent |
'OMPH' | Owner's Major Payments (Monthly) |
Let's find out why these columns were considered as object
.
Obrigado pelo seu feedback!