Group Data
It is time to move to more complicated functions. The first one is .groupby()
! It can be guessed from the title that this function groups our columns, but how? Firstly, look at the example, and everything will become more evident:
Here is the initial dataset:
Look at the job titles.
Imagine you want to know the mean salary for each specialization. However, it's impossible to calculate this value manually; you have plenty of data, so using a function that can group column values is the right way.
Here is the code:
df = df.groupby('job_title').mean()
print(df.head())
Look at the output:
job_title | work_year | salary | salary_in_usd |
---|---|---|---|
3D Computer Vision Researcher | 2021.000000 | 400000.000000 | 5409.000000 |
AI Scientist | 2021.142857 | 290571.428571 | 66135.571429 |
Analytics Engineer | 2022.000000 | 175000.000000 | 175000.000000 |
Applied Data Scientist | 2021.600000 | 172400.000000 | 175655.000000 |
Applied Machine Learning Scientist | 2021.500000 | 141350.000000 | 142068.750000 |
Here, we can see the mean value of work_year
, salary
, and salary_in_usd
for each job_title
.
How does the function work?
DataFrame.groupby()
is general syntax of groupby function.DataFrame.groupby(['job_title'])
in brackets we specify by which column we will group data.DataFrame.groupby(['job_title']).mean()
obligatory we need to give the program instructions: what it has to do with numerical values that relate to one group. In our case, we ask thegroupby
function to calculate themean()
of all numerical values with onejob_title
.
Bedankt voor je feedback!
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.
Awesome!
Completion rate improved to 2.08
Group Data
Veeg om het menu te tonen
It is time to move to more complicated functions. The first one is .groupby()
! It can be guessed from the title that this function groups our columns, but how? Firstly, look at the example, and everything will become more evident:
Here is the initial dataset:
Look at the job titles.
Imagine you want to know the mean salary for each specialization. However, it's impossible to calculate this value manually; you have plenty of data, so using a function that can group column values is the right way.
Here is the code:
df = df.groupby('job_title').mean()
print(df.head())
Look at the output:
job_title | work_year | salary | salary_in_usd |
---|---|---|---|
3D Computer Vision Researcher | 2021.000000 | 400000.000000 | 5409.000000 |
AI Scientist | 2021.142857 | 290571.428571 | 66135.571429 |
Analytics Engineer | 2022.000000 | 175000.000000 | 175000.000000 |
Applied Data Scientist | 2021.600000 | 172400.000000 | 175655.000000 |
Applied Machine Learning Scientist | 2021.500000 | 141350.000000 | 142068.750000 |
Here, we can see the mean value of work_year
, salary
, and salary_in_usd
for each job_title
.
How does the function work?
DataFrame.groupby()
is general syntax of groupby function.DataFrame.groupby(['job_title'])
in brackets we specify by which column we will group data.DataFrame.groupby(['job_title']).mean()
obligatory we need to give the program instructions: what it has to do with numerical values that relate to one group. In our case, we ask thegroupby
function to calculate themean()
of all numerical values with onejob_title
.
Bedankt voor je feedback!