Course Content

Advanced Techniques in pandas

1. Getting Familiar With Indexing and Selecting Data

Outputting Columns by Title Outputting Rows by Index Selecting Specific Rows and Columns Learning More About Indexation Getting Familiar With lambda Functions Expanding Functionality of the .iloc[] Attribute

2. Dealing With Conditions

Setting Condition Selecting Data Based on Condition Dealing With Several Conditions Making Your Code Beautiful Conditions Quiz

3. Extracting Data

Is Data in ...?Combining Your Knowledge Between Method Extracting Specific Data Finding the Smallest Values of a Column Finding the Largest Values of a Column Finding the Correlation

4. Aggregating Data

Getting Familiar With the .groupby() Method Grouping by Several Columns Complicated Grouping Advanced Grouping Dealing With Pivot Tables Creating a Pivot Table

5. Preprocessing Data

Checking for Missing Values Calculating the Number of Missing Values What Will We Do With the NaN Values?How to Delete Only NaN Values?Filling In the Missing Values Managing Categorical Variables Checking the Column Type Managing an Incorrect Column Renaming the Column

Getting Familiar With the .groupby() Method

I am happy to see you in this section. Here, we will group our data to find information about different groups of rows. Examine the data set on delays (you can scroll this table horizontally):

Grouping data is beneficial, and now we will dive deeper into it. Imagine you want to calculate the number of delays for each flight number. Look at the code example and then at the explanation:


              1234
            
import pandas as pd
data = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/4bf24830-59ba-4418-969b-aaf8117d522e/plane', index_col = 0)
data_flights = data[['Flight', 'Delay']].groupby('Flight').sum()
print(data_flights.head())

Explanation:

data[['Flight', 'Delay']].groupby('Flight').sum()

data[['Flight', 'Delay']] - These are the columns you will work on, including the columns you will group;
groupby('Flight') - The 'Flight' column is the argument for the .groupby() function. This means that rows with the same value in the 'Flight' column will be grouped together;
.sum() - This function operates on rows within each group created by .groupby(). In this case, it sums the values in the 'Delay' column for rows that belong to the same 'Flight' group.

Note

Since the 'Delay' column contains only 0 (no delay occurred) or 1 (a delay occurred) as its possible values, the sum of the rows represents the number of delays for each flight.

In fact, .sum() is one of many aggregation functions you can use. You will become familiar with all of them as you proceed.

Fill in the gaps to find the mean value of the 'Time' column depending on the 'DayOfWeek' column.

data_extracted = data[['', 'Time']]('

').mean()
print(data_extracted)

DayOfWeek	Time
3	804.993130
4	804.452984
5	702.888362

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat