Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Missing Values | Preprocessing Data: Part II
Data Manipulation using pandas

Missing Values

The last issue you can meet while working with data is missing data. As you can see, missing data can represented in different ways (like, dots in our dataset).

There are several ways on dealing with missing values: you can either delete rows containing missing values, or replace them with some constant. As was said before, check out if you won't delete big share of data. What values can be used for replacement? One of the most popular options is mean of available data.

If you want to drop rows with NA values, apply the dropna() method. Let's consider what parameters does this method have. drop(axis = 0, how = 'any', thresh, subset, inplace = True)

ParameterDescription
axis = 0/1Determines if rows (0 - default) or columns (1) which contains missing values will be removed
how = 'any'/'all'Determines if row/column will be removed from dataframe, when we have at least one NA ('any' - default) or all NA ('all')
thrash = intOptional, determines that many non-NA values across specified axis. Cannot be combined with the how parameter
subset = 'column'/['column1', 'column2']Optional, what columns/rows should be looked for NA values
inplace = True/FalseShould the changes modify the dataframe rather than creating a new one (default - False, shouldn't)

For instance, let's remove rows containing NA within at least one of columns 'morgh' and 'valueh'.

As you can see, 248 rows were removed since there were NA values in at least one of the 'morgh', 'valueh' columns.

Everything was clear?

Section 2. Chapter 6
course content

Course Content

Data Manipulation using pandas

Missing Values

The last issue you can meet while working with data is missing data. As you can see, missing data can represented in different ways (like, dots in our dataset).

There are several ways on dealing with missing values: you can either delete rows containing missing values, or replace them with some constant. As was said before, check out if you won't delete big share of data. What values can be used for replacement? One of the most popular options is mean of available data.

If you want to drop rows with NA values, apply the dropna() method. Let's consider what parameters does this method have. drop(axis = 0, how = 'any', thresh, subset, inplace = True)

ParameterDescription
axis = 0/1Determines if rows (0 - default) or columns (1) which contains missing values will be removed
how = 'any'/'all'Determines if row/column will be removed from dataframe, when we have at least one NA ('any' - default) or all NA ('all')
thrash = intOptional, determines that many non-NA values across specified axis. Cannot be combined with the how parameter
subset = 'column'/['column1', 'column2']Optional, what columns/rows should be looked for NA values
inplace = True/FalseShould the changes modify the dataframe rather than creating a new one (default - False, shouldn't)

For instance, let's remove rows containing NA within at least one of columns 'morgh' and 'valueh'.

As you can see, 248 rows were removed since there were NA values in at least one of the 'morgh', 'valueh' columns.

Everything was clear?

Section 2. Chapter 6
some-alt