Group Data
It is time to move to more complicated functions. The first one is .groupby()
! It can be guessed from the title that this function groups our columns, but how? Firstly, look at the example, and everything will become more evident:
Here is the initial dataset:
Look at the job titles.
Imagine you want to know the mean salary for each specialization. However, it's impossible to calculate this value manually; you have plenty of data, so using a function that can group column values is the right way.
Here is the code:
df = df.groupby('job_title').mean()
print(df.head())
Look at the output:
job_title | work_year | salary | salary_in_usd |
---|---|---|---|
3D Computer Vision Researcher | 2021.000000 | 400000.000000 | 5409.000000 |
AI Scientist | 2021.142857 | 290571.428571 | 66135.571429 |
Analytics Engineer | 2022.000000 | 175000.000000 | 175000.000000 |
Applied Data Scientist | 2021.600000 | 172400.000000 | 175655.000000 |
Applied Machine Learning Scientist | 2021.500000 | 141350.000000 | 142068.750000 |
Here, we can see the mean value of work_year
, salary
, and salary_in_usd
for each job_title
.
How does the function work?
DataFrame.groupby()
is general syntax of groupby function.DataFrame.groupby(['job_title'])
in brackets we specify by which column we will group data.DataFrame.groupby(['job_title']).mean()
obligatory we need to give the program instructions: what it has to do with numerical values that relate to one group. In our case, we ask thegroupby
function to calculate themean()
of all numerical values with onejob_title
.
Merci pour vos commentaires !
Demandez à l'IA
Demandez à l'IA
Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion
Posez-moi des questions sur ce sujet
Résumer ce chapitre
Afficher des exemples du monde réel
Awesome!
Completion rate improved to 2.08
Group Data
Glissez pour afficher le menu
It is time to move to more complicated functions. The first one is .groupby()
! It can be guessed from the title that this function groups our columns, but how? Firstly, look at the example, and everything will become more evident:
Here is the initial dataset:
Look at the job titles.
Imagine you want to know the mean salary for each specialization. However, it's impossible to calculate this value manually; you have plenty of data, so using a function that can group column values is the right way.
Here is the code:
df = df.groupby('job_title').mean()
print(df.head())
Look at the output:
job_title | work_year | salary | salary_in_usd |
---|---|---|---|
3D Computer Vision Researcher | 2021.000000 | 400000.000000 | 5409.000000 |
AI Scientist | 2021.142857 | 290571.428571 | 66135.571429 |
Analytics Engineer | 2022.000000 | 175000.000000 | 175000.000000 |
Applied Data Scientist | 2021.600000 | 172400.000000 | 175655.000000 |
Applied Machine Learning Scientist | 2021.500000 | 141350.000000 | 142068.750000 |
Here, we can see the mean value of work_year
, salary
, and salary_in_usd
for each job_title
.
How does the function work?
DataFrame.groupby()
is general syntax of groupby function.DataFrame.groupby(['job_title'])
in brackets we specify by which column we will group data.DataFrame.groupby(['job_title']).mean()
obligatory we need to give the program instructions: what it has to do with numerical values that relate to one group. In our case, we ask thegroupby
function to calculate themean()
of all numerical values with onejob_title
.
Merci pour vos commentaires !