Data Manipulation using pandas
The last issue you can meet while working with data is missing data. As you can see, missing data can represented in different ways (like, dots in our dataset).
There are several ways on dealing with missing values: you can either delete rows containing missing values, or replace them with some constant. As was said before, check out if you won't delete big share of data. What values can be used for replacement? One of the most popular options is mean of available data.
If you want to drop rows with NA values, apply the
dropna() method. Let's consider what parameters does this method have.
drop(axis = 0, how = 'any', thresh, subset, inplace = True)
|Determines if rows (|
|Determines if row/column will be removed from dataframe, when we have at least one NA (|
|Optional, determines that many non-NA values across specified axis. Cannot be combined with the |
|Optional, what columns/rows should be looked for NA values|
|Should the changes modify the dataframe rather than creating a new one (default - |
For instance, let's remove rows containing NA within at least one of columns
As you can see, 248 rows were removed since there were NA values in at least one of the
'morgh', 'valueh' columns.