Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Pivot Tables | Explore Dataset
Introduction to Python for Data Analysis
course content

Зміст курсу

Introduction to Python for Data Analysis

Introduction to Python for Data Analysis

1. Introduction to Python 1/2
2. Introduction to Python 2/2
3. Explore Dataset
4. Becoming an Analyst

bookPivot Tables

It's time to deal with a similar function called .pivot_table(). Indeed, it is very similar to .groupby(), but the syntax is different. Here using agg functions is obligatory. If you remember, several chapters ago, we were working with this dataset:

And this example:

1234567
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = df[['salary','job_title', 'experience_level']].groupby(['job_title', 'experience_level']).mean() print(df)
copy

Look at the result:

Let's practice, look at the implimentation using .pivot_table() to get the same result:

12345678910
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = pd.pivot_table(df, index = ['plan','trial'], values = ['price'], aggfunc = [np.mean]) print(df)
copy
  • You should put the dataset as the first argument.
  • Put columns on which you want to group the data to the array index; the order is crucial, like in .groupby().
  • Put columns you want to group to the array values(to calculate mean, median, etc.). The order is not crucial. Indeed, this argument is not obligatory; otherwise, aggfunc will be applied for all numerical columns depending on their group.
  • Put NumPy functions that you want to apply to grouped columns to the array aggfunc(to calculate mean, median, etc.); the order is not crucial. One of those that we learned. But use them without brackets and arguments, just the function's title like np.mean() or np.sum().

Завдання

Your task is to create a pivot table where you will group by plan and count mean and median price. Check, if they vary. Follow the algorithm:

  1. Create a pivot table with the arguments:
  • df as the first argument.
  • 'plan' to the index as the second argument.
  • 'price' to the values as the second argument.
  • np.mean and np.median to the aggfunc as the third argument.
  1. Print the df.

By the way, if they vary significantly, you have outliers (incredibly small or big values).

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 11
toggle bottom row

bookPivot Tables

It's time to deal with a similar function called .pivot_table(). Indeed, it is very similar to .groupby(), but the syntax is different. Here using agg functions is obligatory. If you remember, several chapters ago, we were working with this dataset:

And this example:

1234567
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = df[['salary','job_title', 'experience_level']].groupby(['job_title', 'experience_level']).mean() print(df)
copy

Look at the result:

Let's practice, look at the implimentation using .pivot_table() to get the same result:

12345678910
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = pd.pivot_table(df, index = ['plan','trial'], values = ['price'], aggfunc = [np.mean]) print(df)
copy
  • You should put the dataset as the first argument.
  • Put columns on which you want to group the data to the array index; the order is crucial, like in .groupby().
  • Put columns you want to group to the array values(to calculate mean, median, etc.). The order is not crucial. Indeed, this argument is not obligatory; otherwise, aggfunc will be applied for all numerical columns depending on their group.
  • Put NumPy functions that you want to apply to grouped columns to the array aggfunc(to calculate mean, median, etc.); the order is not crucial. One of those that we learned. But use them without brackets and arguments, just the function's title like np.mean() or np.sum().

Завдання

Your task is to create a pivot table where you will group by plan and count mean and median price. Check, if they vary. Follow the algorithm:

  1. Create a pivot table with the arguments:
  • df as the first argument.
  • 'plan' to the index as the second argument.
  • 'price' to the values as the second argument.
  • np.mean and np.median to the aggfunc as the third argument.
  1. Print the df.

By the way, if they vary significantly, you have outliers (incredibly small or big values).

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 3. Розділ 11
toggle bottom row

bookPivot Tables

It's time to deal with a similar function called .pivot_table(). Indeed, it is very similar to .groupby(), but the syntax is different. Here using agg functions is obligatory. If you remember, several chapters ago, we were working with this dataset:

And this example:

1234567
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = df[['salary','job_title', 'experience_level']].groupby(['job_title', 'experience_level']).mean() print(df)
copy

Look at the result:

Let's practice, look at the implimentation using .pivot_table() to get the same result:

12345678910
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = pd.pivot_table(df, index = ['plan','trial'], values = ['price'], aggfunc = [np.mean]) print(df)
copy
  • You should put the dataset as the first argument.
  • Put columns on which you want to group the data to the array index; the order is crucial, like in .groupby().
  • Put columns you want to group to the array values(to calculate mean, median, etc.). The order is not crucial. Indeed, this argument is not obligatory; otherwise, aggfunc will be applied for all numerical columns depending on their group.
  • Put NumPy functions that you want to apply to grouped columns to the array aggfunc(to calculate mean, median, etc.); the order is not crucial. One of those that we learned. But use them without brackets and arguments, just the function's title like np.mean() or np.sum().

Завдання

Your task is to create a pivot table where you will group by plan and count mean and median price. Check, if they vary. Follow the algorithm:

  1. Create a pivot table with the arguments:
  • df as the first argument.
  • 'plan' to the index as the second argument.
  • 'price' to the values as the second argument.
  • np.mean and np.median to the aggfunc as the third argument.
  1. Print the df.

By the way, if they vary significantly, you have outliers (incredibly small or big values).

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

It's time to deal with a similar function called .pivot_table(). Indeed, it is very similar to .groupby(), but the syntax is different. Here using agg functions is obligatory. If you remember, several chapters ago, we were working with this dataset:

And this example:

1234567
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = df[['salary','job_title', 'experience_level']].groupby(['job_title', 'experience_level']).mean() print(df)
copy

Look at the result:

Let's practice, look at the implimentation using .pivot_table() to get the same result:

12345678910
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/INTRO+to+Python/ds_salaries.csv', index_col = 0) df = pd.pivot_table(df, index = ['plan','trial'], values = ['price'], aggfunc = [np.mean]) print(df)
copy
  • You should put the dataset as the first argument.
  • Put columns on which you want to group the data to the array index; the order is crucial, like in .groupby().
  • Put columns you want to group to the array values(to calculate mean, median, etc.). The order is not crucial. Indeed, this argument is not obligatory; otherwise, aggfunc will be applied for all numerical columns depending on their group.
  • Put NumPy functions that you want to apply to grouped columns to the array aggfunc(to calculate mean, median, etc.); the order is not crucial. One of those that we learned. But use them without brackets and arguments, just the function's title like np.mean() or np.sum().

Завдання

Your task is to create a pivot table where you will group by plan and count mean and median price. Check, if they vary. Follow the algorithm:

  1. Create a pivot table with the arguments:
  • df as the first argument.
  • 'plan' to the index as the second argument.
  • 'price' to the values as the second argument.
  • np.mean and np.median to the aggfunc as the third argument.
  1. Print the df.

By the way, if they vary significantly, you have outliers (incredibly small or big values).

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Секція 3. Розділ 11
Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
some-alt