Replacing Specific Elements
The next step we need to do is to replace dots. This task is a bit harder than the previous one, since you will replace only specific elements.
First, let's remind how to select specific rows and columns based on some condition. It can be done by applying the .loc[]
property. The first parameter is either row numbersm, or condition; the second one is column names. For instance, let's get the rows containing only dot characters .
within the 'morgh'
column.
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data1.csv') # Output only dot values within the 'morgh' column print(df.loc[df.morgh == '.', 'morgh'])
Since we accessed the necessary rows, we can easily replace them by reassigning. We are going to repalce all the dots by NA
values (nan
from NumPy
) and then convert the resulting column to float
type (NA
doesn't support int
type, float
only).
# Importing libraries import pandas as pd import numpy as np # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data1.csv') # Perform a replacement df.loc[df.morgh == '.', 'morgh'] = np.nan # Converting df.morgh = df.morgh.astype(float) print(df.morgh)
As you can see, the column is now considered to have float
type, which means you can apply numerical methods to it (i.e., you can calculate mean, min, max, etc.).
Thanks for your feedback!