Course Content

# Advanced Techniques in pandas

1. Get Familiar With Indexing and Selecting Data

2. Dealing With Conditions

Advanced Techniques in pandas

## Manage Categorical Variables

Now, you will work with the data set that doesn't contain missing values. The `NaN`

values from the column `'Age'`

were replaced with the **mean** of the column, and the `NaN`

value from the `'Fare'`

column was deleted.
So, now it's time to learn how to manage categorical variables. Categorical means that they have some categories. For instance, in the column `'Sex'`

, there is `'male'`

and `'female'`

; or in the column `'Embarked'`

, there is `'Q'`

, `'S'`

, and `'C'`

.

**What should we do to calculate the number of values in each category or to find out information on them?**

You already know `.loc[]`

, `.isin()`

, `.between()`

and a lot of functions, but in pandas, there is a more beautiful and convenient way to do this. Use the function `.get_dummies()`

. As an example, we will apply it to the column `'Embarked'`

. Look at the implementation and the result (we will output 5 random passengers' names and new columns that we created).

Look at the result:

PassengerId | Embarked_C | Embarked_Q | Embarked_S | |

211 | Finoli, Mr. Luigi | 0 | 0 | 1 |

205 | Omont, Mr. Alfred Fernand | 1 | 0 | 0 |

212 | Deacon, Mr. Percy William | 0 | 0 | 1 |

372 | Ismay, Mr. Joseph Bruce | 0 | 0 | 1 |

308 | Hays, Mr. Charles Melville | 0 | 0 | 1 |

**Explanation:**

As a result, our function split the column `'Embarked'`

into three columns: `'Embarked_C'`

, and `'Embarked_Q'`

, `'Embarked_S'`

. In total, we have three categories. Each passenger has their category in the `'Embarked'`

column. Thus, our function creates three columns corresponding to each category, and in line with each passenger, it fills the row of the column with `1`

if the person was initially related to the geography; otherwise, it says `0`

. Thus, we get `1`

in just one column.

`pd.get_dummies()`

- this function converts**categorical**variables into**dummy**ones (1 or 0).`data`

- the data frame that you want to use.`columns = ['Embarked']`

- columns have categorical variables that you want to transform into dummy ones. Pay attention; it is**obligatory**to put column names into the list.

# Task

Your task here is to transform the column `'Sex'`

into one with dummy variables instead of categorical ones. Then output the **sum** of the values in each category.

Everything was clear?