Conteúdo do Curso
Introduction to Python for Data Analysis
Introduction to Python for Data Analysis
Group Data
It is time to move to more complicated functions. The first one is .groupby()
! It can be guessed from the title that this function groups our columns, but how? Firstly, look at the example, and everything will become more evident:
Here is the initial dataset:
Look at the job titles.
Imagine you want to know the mean salary for each specialization. However, it's impossible to calculate this value manually; you have plenty of data, so using a function that can group column values is the right way.
Here is the code:
Look at the output:
job_title | work_year | salary | salary_in_usd |
3D Computer Vision Researcher | 2021.000000 | 400000.000000 | 5409.000000 |
AI Scientist | 2021.142857 | 290571.428571 | 66135.571429 |
Analytics Engineer | 2022.000000 | 175000.000000 | 175000.000000 |
Applied Data Scientist | 2021.600000 | 172400.000000 | 175655.000000 |
Applied Machine Learning Scientist | 2021.500000 | 141350.000000 | 142068.750000 |
Here, we can see the mean value of work_year
, salary
, and salary_in_usd
for each job_title
.
How does the function work?
DataFrame.groupby()
is general syntax of groupby function.DataFrame.groupby(['job_title'])
in brackets we specify by which column we will group data.DataFrame.groupby(['job_title']).mean()
obligatory we need to give the program instructions: what it has to do with numerical values that relate to one group. In our case, we ask thegroupby
function to calculate themean()
of all numerical values with onejob_title
.
Obrigado pelo seu feedback!