Leer Group Data | Explore Dataset

It is time to move to more complicated functions. The first one is .groupby()! It can be guessed from the title that this function groups our columns, but how? Firstly, look at the example, and everything will become more evident:

Here is the initial dataset:

Look at the job titles.

Imagine you want to know the mean salary for each specialization. However, it's impossible to calculate this value manually; you have plenty of data, so using a function that can group column values is the right way.

Here is the code:

df = df.groupby('job_title').mean()
print(df.head())

Look at the output:

job_title	work_year	salary	salary_in_usd
3D Computer Vision Researcher	2021.000000	400000.000000	5409.000000
AI Scientist	2021.142857	290571.428571	66135.571429
Analytics Engineer	2022.000000	175000.000000	175000.000000
Applied Data Scientist	2021.600000	172400.000000	175655.000000
Applied Machine Learning Scientist	2021.500000	141350.000000	142068.750000

Here, we can see the mean value of work_year, salary, and salary_in_usd for each job_title.

How does the function work?

DataFrame.groupby() is general syntax of groupby function.
DataFrame.groupby(['job_title']) in brackets we specify by which column we will group data.
DataFrame.groupby(['job_title']).mean() obligatory we need to give the program instructions: what it has to do with numerical values that relate to one group. In our case, we ask the groupby function to calculate the mean() of all numerical values with one job_title.

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 3. Hoofdstuk 7

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Veeg om het menu te tonen