Group Data

It is time to move to more complicated functions. The first one is .groupby()! It can be guessed from the title that this function groups our columns, but how? Firstly, look at the example, and everything will become more evident:

Here is the initial dataset:

Look at the job titles.

Imagine you want to know the mean salary for each specialization. However, it's impossible to calculate this value manually; you have plenty of data, so using a function that can group column values is the right way.

Here is the code:

Look at the output:

job_title	work_year	salary	salary_in_usd
3D Computer Vision Researcher	2021.000000	400000.000000	5409.000000
AI Scientist	2021.142857	290571.428571	66135.571429
Analytics Engineer	2022.000000	175000.000000	175000.000000
Applied Data Scientist	2021.600000	172400.000000	175655.000000
Applied Machine Learning Scientist	2021.500000	141350.000000	142068.750000

Here, we can see the mean value of work_year, salary, and salary_in_usd for each job_title.

How does the function work?

DataFrame.groupby() is general syntax of groupby function.
DataFrame.groupby(['job_title']) in brackets we specify by which column we will group data.
DataFrame.groupby(['job_title']).mean() obligatory we need to give the program instructions: what it has to do with numerical values that relate to one group. In our case, we ask the groupby function to calculate the mean() of all numerical values with one job_title.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 7

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Introduction to Python for Data Analysis

1. Introduction to Python 1/2

2. Introduction to Python 2/2

If/Else Challenge If/Elif/Else Challenge For Loop While Loop Challenge

3. Explore Dataset

4. Becoming an Analyst

Data Study Examine Dataset Challenge СAC 1/2 CAC 2/2 Costly Customers Organic Traffic How Much Do We Earn Is Our Project Profitable? ROI 1/2 ROI 2/2 Visualize ROI