Conteúdo do Curso
Introduction to Python for Data Analysis
Introduction to Python for Data Analysis
Pivot Tables
It's time to deal with a similar function called .pivot_table()
. Indeed, it is very similar to .groupby()
, but the syntax is different. Here using agg
functions is obligatory. If you remember, several chapters ago, we were working with this dataset:
And this example:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = df[['salary','job_title', 'experience_level']].groupby(['job_title', 'experience_level']).mean() print(df)
Look at the result:
Let's practice, look at the implimentation using .pivot_table()
to get the same result:
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = pd.pivot_table(df, index = ['plan','trial'], values = ['price'], aggfunc = [np.mean]) print(df)
- You should put the dataset as the first argument.
- Put columns on which you want to group the data to the array
index
; the order is crucial, like in.groupby()
. - Put columns you want to group to the array
values
(to calculate mean, median, etc.). The order is not crucial. Indeed, this argument is not obligatory; otherwise,aggfunc
will be applied for all numerical columns depending on their group. - Put NumPy functions that you want to apply to grouped columns to the array
aggfunc
(to calculate mean, median, etc.); the order is not crucial. One of those that we learned. But use them without brackets and arguments, just the function's title likenp.mean()
ornp.sum()
.
Tarefa
Your task is to create a pivot table where you will group by plan and count mean and median price. Check, if they vary. Follow the algorithm:
- Create a pivot table with the arguments:
df
as the first argument.'plan'
to theindex
as the second argument.'price'
to thevalues
as the second argument.np.mean
andnp.median
to theaggfunc
as the third argument.
- Print the
df
.
By the way, if they vary significantly, you have outliers (incredibly small or big values).
Obrigado pelo seu feedback!
Pivot Tables
It's time to deal with a similar function called .pivot_table()
. Indeed, it is very similar to .groupby()
, but the syntax is different. Here using agg
functions is obligatory. If you remember, several chapters ago, we were working with this dataset:
And this example:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = df[['salary','job_title', 'experience_level']].groupby(['job_title', 'experience_level']).mean() print(df)
Look at the result:
Let's practice, look at the implimentation using .pivot_table()
to get the same result:
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = pd.pivot_table(df, index = ['plan','trial'], values = ['price'], aggfunc = [np.mean]) print(df)
- You should put the dataset as the first argument.
- Put columns on which you want to group the data to the array
index
; the order is crucial, like in.groupby()
. - Put columns you want to group to the array
values
(to calculate mean, median, etc.). The order is not crucial. Indeed, this argument is not obligatory; otherwise,aggfunc
will be applied for all numerical columns depending on their group. - Put NumPy functions that you want to apply to grouped columns to the array
aggfunc
(to calculate mean, median, etc.); the order is not crucial. One of those that we learned. But use them without brackets and arguments, just the function's title likenp.mean()
ornp.sum()
.
Tarefa
Your task is to create a pivot table where you will group by plan and count mean and median price. Check, if they vary. Follow the algorithm:
- Create a pivot table with the arguments:
df
as the first argument.'plan'
to theindex
as the second argument.'price'
to thevalues
as the second argument.np.mean
andnp.median
to theaggfunc
as the third argument.
- Print the
df
.
By the way, if they vary significantly, you have outliers (incredibly small or big values).
Obrigado pelo seu feedback!
Pivot Tables
It's time to deal with a similar function called .pivot_table()
. Indeed, it is very similar to .groupby()
, but the syntax is different. Here using agg
functions is obligatory. If you remember, several chapters ago, we were working with this dataset:
And this example:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = df[['salary','job_title', 'experience_level']].groupby(['job_title', 'experience_level']).mean() print(df)
Look at the result:
Let's practice, look at the implimentation using .pivot_table()
to get the same result:
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = pd.pivot_table(df, index = ['plan','trial'], values = ['price'], aggfunc = [np.mean]) print(df)
- You should put the dataset as the first argument.
- Put columns on which you want to group the data to the array
index
; the order is crucial, like in.groupby()
. - Put columns you want to group to the array
values
(to calculate mean, median, etc.). The order is not crucial. Indeed, this argument is not obligatory; otherwise,aggfunc
will be applied for all numerical columns depending on their group. - Put NumPy functions that you want to apply to grouped columns to the array
aggfunc
(to calculate mean, median, etc.); the order is not crucial. One of those that we learned. But use them without brackets and arguments, just the function's title likenp.mean()
ornp.sum()
.
Tarefa
Your task is to create a pivot table where you will group by plan and count mean and median price. Check, if they vary. Follow the algorithm:
- Create a pivot table with the arguments:
df
as the first argument.'plan'
to theindex
as the second argument.'price'
to thevalues
as the second argument.np.mean
andnp.median
to theaggfunc
as the third argument.
- Print the
df
.
By the way, if they vary significantly, you have outliers (incredibly small or big values).
Obrigado pelo seu feedback!
It's time to deal with a similar function called .pivot_table()
. Indeed, it is very similar to .groupby()
, but the syntax is different. Here using agg
functions is obligatory. If you remember, several chapters ago, we were working with this dataset:
And this example:
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = df[['salary','job_title', 'experience_level']].groupby(['job_title', 'experience_level']).mean() print(df)
Look at the result:
Let's practice, look at the implimentation using .pivot_table()
to get the same result:
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = pd.pivot_table(df, index = ['plan','trial'], values = ['price'], aggfunc = [np.mean]) print(df)
- You should put the dataset as the first argument.
- Put columns on which you want to group the data to the array
index
; the order is crucial, like in.groupby()
. - Put columns you want to group to the array
values
(to calculate mean, median, etc.). The order is not crucial. Indeed, this argument is not obligatory; otherwise,aggfunc
will be applied for all numerical columns depending on their group. - Put NumPy functions that you want to apply to grouped columns to the array
aggfunc
(to calculate mean, median, etc.); the order is not crucial. One of those that we learned. But use them without brackets and arguments, just the function's title likenp.mean()
ornp.sum()
.
Tarefa
Your task is to create a pivot table where you will group by plan and count mean and median price. Check, if they vary. Follow the algorithm:
- Create a pivot table with the arguments:
df
as the first argument.'plan'
to theindex
as the second argument.'price'
to thevalues
as the second argument.np.mean
andnp.median
to theaggfunc
as the third argument.
- Print the
df
.
By the way, if they vary significantly, you have outliers (incredibly small or big values).