Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Box Plot | More Statistical Plots
Quizzes & Challenges
Quizzes
Challenges
/
Ultimate Visualization with Python

bookBox Plot

Note
Definition

Box plot is another extremely common plot in statistics used to visualize the central tendency, spread, and potential outliers within the data via their quartiles.

Quartiles

quartiles

Quartiles split sorted data into four equal parts:

  • Q1 β€” the midpoint between the minimum and the median (25% of data below it);
  • Q2 β€” the median (50% of data below);
  • Q3 β€” the midpoint between the median and the maximum (75% of data below).

Box Plot Elements

box_plot_explained
  • The left side of the box shows Q1, the right side shows Q3;
  • IQR = Q3 βˆ’ Q1, shown as the width of the box, with the median marked by a yellow line;
  • Whiskers extend to (Q1 - 1.5 \cdot IQR) and (Q3 + 1.5 \cdot IQR);
  • Points outside the whiskers are outliers.

A box plot can be generated using matplotlib.

1234567891011
import pandas as pd import matplotlib.pyplot as plt # Loading the dataset with the average yearly temperatures in Boston and Seattle url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating a box plot for the Seattle temperatures plt.boxplot(weather_df['Seattle']) plt.show()
copy

Box Plot Data

Use plt.boxplot(x), where x can be a 1D array-like object, a 2D array (one box per column), or a sequence of 1D arrays.

Optional Parameters

tick_labels is useful for naming box plots β€” especially when plotting multiple arrays.

12345678910
import pandas as pd import matplotlib.pyplot as plt # Loading the dataset with the average yearly temperatures in Boston and Seattle url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating two box plots for Boston and Seattle temperatures plt.boxplot(weather_df, tick_labels=['Boston', 'Seattle']) plt.show()
copy

Passing a DataFrame with two numeric columns to boxplot() creates two separate box plots with labels automatically assigned.

Note
Study More

There are also quite a bit of optional parameters for customizing the box plot, which you can explore in boxplot() documentation, yet in practice you might rarely use them.

Task

Swipe to start coding

Create two box plots using two samples from the standard normal distribution:

  1. Use the correct function to create the box plots.
  2. Use the list of normal_sample_1 and normal_sample_2 (in this order from left to right) as the data.
  3. Label the left box plot as First sample and the right one as Second sample using the list.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 2
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how to interpret a box plot?

What does the IQR tell us about the data?

How do I identify outliers using a box plot?

close

bookBox Plot

Swipe to show menu

Note
Definition

Box plot is another extremely common plot in statistics used to visualize the central tendency, spread, and potential outliers within the data via their quartiles.

Quartiles

quartiles

Quartiles split sorted data into four equal parts:

  • Q1 β€” the midpoint between the minimum and the median (25% of data below it);
  • Q2 β€” the median (50% of data below);
  • Q3 β€” the midpoint between the median and the maximum (75% of data below).

Box Plot Elements

box_plot_explained
  • The left side of the box shows Q1, the right side shows Q3;
  • IQR = Q3 βˆ’ Q1, shown as the width of the box, with the median marked by a yellow line;
  • Whiskers extend to (Q1 - 1.5 \cdot IQR) and (Q3 + 1.5 \cdot IQR);
  • Points outside the whiskers are outliers.

A box plot can be generated using matplotlib.

1234567891011
import pandas as pd import matplotlib.pyplot as plt # Loading the dataset with the average yearly temperatures in Boston and Seattle url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating a box plot for the Seattle temperatures plt.boxplot(weather_df['Seattle']) plt.show()
copy

Box Plot Data

Use plt.boxplot(x), where x can be a 1D array-like object, a 2D array (one box per column), or a sequence of 1D arrays.

Optional Parameters

tick_labels is useful for naming box plots β€” especially when plotting multiple arrays.

12345678910
import pandas as pd import matplotlib.pyplot as plt # Loading the dataset with the average yearly temperatures in Boston and Seattle url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating two box plots for Boston and Seattle temperatures plt.boxplot(weather_df, tick_labels=['Boston', 'Seattle']) plt.show()
copy

Passing a DataFrame with two numeric columns to boxplot() creates two separate box plots with labels automatically assigned.

Note
Study More

There are also quite a bit of optional parameters for customizing the box plot, which you can explore in boxplot() documentation, yet in practice you might rarely use them.

Task

Swipe to start coding

Create two box plots using two samples from the standard normal distribution:

  1. Use the correct function to create the box plots.
  2. Use the list of normal_sample_1 and normal_sample_2 (in this order from left to right) as the data.
  3. Label the left box plot as First sample and the right one as Second sample using the list.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 2
single

single

some-alt