Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Replacing Specific Elements | Preprocessing Data: Part I
Data Manipulation using pandas

bookReplacing Specific Elements

The next step we need to do is to replace dots. This task is a bit harder than the previous one, since you will replace only specific elements.

First, let's remind how to select specific rows and columns based on some condition. It can be done by applying the .loc[] property. The first parameter is either row numbersm, or condition; the second one is column names. For instance, let's get the rows containing only dot characters . within the 'morgh' column.

# Importing the library
import pandas as pd

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data1.csv')
# Output only dot values within the 'morgh' column
print(df.loc[df.morgh == '.', 'morgh'])
1234567
# Importing the library import pandas as pd # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data1.csv') # Output only dot values within the 'morgh' column print(df.loc[df.morgh == '.', 'morgh'])
copy

Since we accessed the necessary rows, we can easily replace them by reassigning. We are going to repalce all the dots by NA values (nan from NumPy) and then convert the resulting column to float type (NA doesn't support int type, float only).

# Importing libraries
import pandas as pd
import numpy as np

# Reading the file
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data1.csv')
# Perform a replacement
df.loc[df.morgh == '.', 'morgh'] = np.nan
# Converting
df.morgh = df.morgh.astype(float)
print(df.morgh)
1234567891011
# Importing libraries import pandas as pd import numpy as np # Reading the file df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/f2947b09-5f0d-4ad9-992f-ec0b87cd4b3f/data1.csv') # Perform a replacement df.loc[df.morgh == '.', 'morgh'] = np.nan # Converting df.morgh = df.morgh.astype(float) print(df.morgh)
copy

As you can see, the column is now considered to have float type, which means you can apply numerical methods to it (i.e., you can calculate mean, min, max, etc.).

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 6
some-alt