Course Content
Introduction to Python for Data Analysis
Introduction to Python for Data Analysis
Group Data
It is time to move to more complicated functions. The first one is .groupby()
! It can be guessed from the title that this function groups our columns, but how? Firstly, look at the example, and everything will become more evident:
Here is the initial dataset:
Look at the job titles.
Imagine you want to know the mean salary for each specialization. However, it's impossible to calculate this value manually; you have plenty of data, so using a function that can group column values is the right way.
Here is the code:
Look at the output:
job_title | work_year | salary | salary_in_usd |
3D Computer Vision Researcher | 2021.000000 | 400000.000000 | 5409.000000 |
AI Scientist | 2021.142857 | 290571.428571 | 66135.571429 |
Analytics Engineer | 2022.000000 | 175000.000000 | 175000.000000 |
Applied Data Scientist | 2021.600000 | 172400.000000 | 175655.000000 |
Applied Machine Learning Scientist | 2021.500000 | 141350.000000 | 142068.750000 |
Here, we can see the mean value of work_year
, salary
, and salary_in_usd
for each job_title
.
How does the function work?
DataFrame.groupby()
is general syntax of groupby function.DataFrame.groupby(['job_title'])
in brackets we specify by which column we will group data.DataFrame.groupby(['job_title']).mean()
obligatory we need to give the program instructions: what it has to do with numerical values that relate to one group. In our case, we ask thegroupby
function to calculate themean()
of all numerical values with onejob_title
.
Thanks for your feedback!