Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Box Plot | More Statistical Plots
Ultimate Visualization with Python

Box PlotBox Plot

Box plot is another extremely common plot in statistics used to visualize the central tendency, spread, and potential outliers within the data via their quartiles.

Quartiles

Quartiles divide the data points (sorted in ascending order) into four equal-sized parts. There are three of them:

  • The first quartile (Q1) is the middle number between the smallest value (number) of the sample and median (25% of the data lies in this range);
  • The second quartile (Q2) is the median itself (50% of the data lies below the median);
  • The third quartile (Q3) is the middle number between the median of the sample and the highest values of the sample (75% of the data lies below the Q3).

Let’s have a look at an example of a box blot:

Box plot example

This box plot is based on the data of the GDP per capita in different countries.

Box Plot Elements

  • The upper side of the blue rectangle represents the third (upper) quartile and the lower side represents the first quartile;
  • Q3- Q1 is called the interquartile range (IR) which is represented by the rectangle where the green line is the median;
  • The black lines outside the rectangle called whiskers. The lower one represents Q1 -1.5* IR, and the upper one represents Q3 +1.5* IR;
  • The data points which are outside the whiskers are called outliers (in this example there are quite a lot of them).

Now it's time to create a box plot with the help of matplotlib:

Box Plot Data

As you can see, everything is rather simple here. You simply need to use the boxplot() function from the pyplot module with the first (the only required) parameter (called x) being your data. It can either be an array-like (here Series), a 2D array (a box plot is drawn for each column) or a sequence of 1D arrays (a box plot is drawn for each array).

Optional Parameters

There are also quite a bit of optional parameters for customizing the box plot, which you can explore here, yet in practice you might rarely use them.

labels parameter is an exception. This one in particular is useful not only to label a single box plot, but to label the box plots when there is more than one array:

Here we slightly modified our example by passing the entire DataFrame, which has 2 columns, and labeling each box plot appropriately.

Завдання

Your task is to create two box plots using two samples from the standard normal distribution:

  1. Use the correct function to create the box plots.
  2. Use the list of normal_sample_1 and normal_sample_2 (in this order from left to right) as the data.
  3. Label the left box plot as First sample and the right one as Second sample using the list.

Все було зрозуміло?

Секція 4. Розділ 2
toggle bottom row
course content

Зміст курсу

Ultimate Visualization with Python

Box PlotBox Plot

Box plot is another extremely common plot in statistics used to visualize the central tendency, spread, and potential outliers within the data via their quartiles.

Quartiles

Quartiles divide the data points (sorted in ascending order) into four equal-sized parts. There are three of them:

  • The first quartile (Q1) is the middle number between the smallest value (number) of the sample and median (25% of the data lies in this range);
  • The second quartile (Q2) is the median itself (50% of the data lies below the median);
  • The third quartile (Q3) is the middle number between the median of the sample and the highest values of the sample (75% of the data lies below the Q3).

Let’s have a look at an example of a box blot:

Box plot example

This box plot is based on the data of the GDP per capita in different countries.

Box Plot Elements

  • The upper side of the blue rectangle represents the third (upper) quartile and the lower side represents the first quartile;
  • Q3- Q1 is called the interquartile range (IR) which is represented by the rectangle where the green line is the median;
  • The black lines outside the rectangle called whiskers. The lower one represents Q1 -1.5* IR, and the upper one represents Q3 +1.5* IR;
  • The data points which are outside the whiskers are called outliers (in this example there are quite a lot of them).

Now it's time to create a box plot with the help of matplotlib:

Box Plot Data

As you can see, everything is rather simple here. You simply need to use the boxplot() function from the pyplot module with the first (the only required) parameter (called x) being your data. It can either be an array-like (here Series), a 2D array (a box plot is drawn for each column) or a sequence of 1D arrays (a box plot is drawn for each array).

Optional Parameters

There are also quite a bit of optional parameters for customizing the box plot, which you can explore here, yet in practice you might rarely use them.

labels parameter is an exception. This one in particular is useful not only to label a single box plot, but to label the box plots when there is more than one array:

Here we slightly modified our example by passing the entire DataFrame, which has 2 columns, and labeling each box plot appropriately.

Завдання

Your task is to create two box plots using two samples from the standard normal distribution:

  1. Use the correct function to create the box plots.
  2. Use the list of normal_sample_1 and normal_sample_2 (in this order from left to right) as the data.
  3. Label the left box plot as First sample and the right one as Second sample using the list.

Все було зрозуміло?

Секція 4. Розділ 2
toggle bottom row
some-alt