Course Content

Ultimate Visualization with Python

## Ultimate Visualization with Python

# Histogram

Let’s start with a histogram. **Histograms** are used to represent frequency or probability distribution of a given variable (approximate distribution) using vertical bins of equal width (or we can call them bars).

`pyplot`

module has a special function called `hist`

to create a histogram. The first and the only required parameter is our **data** (called `x`

) which can be either an array or a sequence of arrays. If a sequence of arrays is passed, the bins for each array are painted in **different** colors. Here is a simple example for you:

## Intervals and Height

We passed a `Series`

object, which contains average yearly temperatures in Seattle, in the `hist()`

function. Our sample was divided into `10`

equal intervals by default starting from the **minimum** value to the **maximum** value. There are, however, only `9`

bins, since there are no values which belong to the second interval.

The height of each bin by default is equal to the **frequency** of the values in this interval (number of times they occur).

## Number of Bins

Another important, yet optional parameter is `bins`

which takes either the number of bins (integer) or a sequence of numbers specifying the edges of the bins or a string. Most of the time passing the number of bins is more than enough.

There several methods for determining the width of the bins (more on this here), but here we will use the **Sturges' formula** (written in Python): `bins = 1+int(np.log2(n))`

where n is the sample size (the size of the array).

Let’s see it in action:

The number of rows in the `DataFrame`

is 26 (the size of the `Series`

), so the resulting number of bins is 5.

## Probability Density Approximation

That’s all fine, but what if we want to have a look at the **probability density** approximation? All we need is to set the parameter `density`

to `True`

.

Now the height of each bin will be the count of the values in the interval divided by the product of the **total number of values** (the size of the sample) and the **bin width**. As a result, the sum of the areas of the bins will be equal to **1**, which is exactly what we need from a **probability density function**.

Let’s now modify our example:

Now we have an approximation of the probability density function for our temperature data.

If you want to explore more about the `hist()`

function parameters, you can refer to its documentation.

Task

Your task is to create an approximation of a probability density function using a sample from the standard normal distribution:

- Use the correct function for creating a histogram.
- Use
`normal_sample`

as the data for the histogram. - Specify the number of bins as the second argument using the Sturges' formula.
- Make the histogram an approximation of a probability density function via correctly specifying the rightmost argument.

Task

Your task is to create an approximation of a probability density function using a sample from the standard normal distribution:

- Use the correct function for creating a histogram.
- Use
`normal_sample`

as the data for the histogram. - Specify the number of bins as the second argument using the Sturges' formula.
- Make the histogram an approximation of a probability density function via correctly specifying the rightmost argument.

Everything was clear?

# Histogram

Let’s start with a histogram. **Histograms** are used to represent frequency or probability distribution of a given variable (approximate distribution) using vertical bins of equal width (or we can call them bars).

`pyplot`

module has a special function called `hist`

to create a histogram. The first and the only required parameter is our **data** (called `x`

) which can be either an array or a sequence of arrays. If a sequence of arrays is passed, the bins for each array are painted in **different** colors. Here is a simple example for you:

## Intervals and Height

We passed a `Series`

object, which contains average yearly temperatures in Seattle, in the `hist()`

function. Our sample was divided into `10`

equal intervals by default starting from the **minimum** value to the **maximum** value. There are, however, only `9`

bins, since there are no values which belong to the second interval.

The height of each bin by default is equal to the **frequency** of the values in this interval (number of times they occur).

## Number of Bins

Another important, yet optional parameter is `bins`

which takes either the number of bins (integer) or a sequence of numbers specifying the edges of the bins or a string. Most of the time passing the number of bins is more than enough.

There several methods for determining the width of the bins (more on this here), but here we will use the **Sturges' formula** (written in Python): `bins = 1+int(np.log2(n))`

where n is the sample size (the size of the array).

Let’s see it in action:

The number of rows in the `DataFrame`

is 26 (the size of the `Series`

), so the resulting number of bins is 5.

## Probability Density Approximation

That’s all fine, but what if we want to have a look at the **probability density** approximation? All we need is to set the parameter `density`

to `True`

.

Now the height of each bin will be the count of the values in the interval divided by the product of the **total number of values** (the size of the sample) and the **bin width**. As a result, the sum of the areas of the bins will be equal to **1**, which is exactly what we need from a **probability density function**.

Let’s now modify our example:

Now we have an approximation of the probability density function for our temperature data.

If you want to explore more about the `hist()`

function parameters, you can refer to its documentation.

Task

Your task is to create an approximation of a probability density function using a sample from the standard normal distribution:

- Use the correct function for creating a histogram.
- Use
`normal_sample`

as the data for the histogram. - Specify the number of bins as the second argument using the Sturges' formula.
- Make the histogram an approximation of a probability density function via correctly specifying the rightmost argument.

Task

- Use the correct function for creating a histogram.
- Use
`normal_sample`

as the data for the histogram. - Specify the number of bins as the second argument using the Sturges' formula.

Everything was clear?

# Histogram

Let’s start with a histogram. **Histograms** are used to represent frequency or probability distribution of a given variable (approximate distribution) using vertical bins of equal width (or we can call them bars).

`pyplot`

module has a special function called `hist`

to create a histogram. The first and the only required parameter is our **data** (called `x`

) which can be either an array or a sequence of arrays. If a sequence of arrays is passed, the bins for each array are painted in **different** colors. Here is a simple example for you:

## Intervals and Height

We passed a `Series`

object, which contains average yearly temperatures in Seattle, in the `hist()`

function. Our sample was divided into `10`

equal intervals by default starting from the **minimum** value to the **maximum** value. There are, however, only `9`

bins, since there are no values which belong to the second interval.

The height of each bin by default is equal to the **frequency** of the values in this interval (number of times they occur).

## Number of Bins

Another important, yet optional parameter is `bins`

which takes either the number of bins (integer) or a sequence of numbers specifying the edges of the bins or a string. Most of the time passing the number of bins is more than enough.

There several methods for determining the width of the bins (more on this here), but here we will use the **Sturges' formula** (written in Python): `bins = 1+int(np.log2(n))`

where n is the sample size (the size of the array).

Let’s see it in action:

The number of rows in the `DataFrame`

is 26 (the size of the `Series`

), so the resulting number of bins is 5.

## Probability Density Approximation

That’s all fine, but what if we want to have a look at the **probability density** approximation? All we need is to set the parameter `density`

to `True`

.

Now the height of each bin will be the count of the values in the interval divided by the product of the **total number of values** (the size of the sample) and the **bin width**. As a result, the sum of the areas of the bins will be equal to **1**, which is exactly what we need from a **probability density function**.

Let’s now modify our example:

Now we have an approximation of the probability density function for our temperature data.

If you want to explore more about the `hist()`

function parameters, you can refer to its documentation.

Task

- Use the correct function for creating a histogram.
- Use
`normal_sample`

as the data for the histogram. - Specify the number of bins as the second argument using the Sturges' formula.

Task

- Use the correct function for creating a histogram.
- Use
`normal_sample`

as the data for the histogram. - Specify the number of bins as the second argument using the Sturges' formula.

Everything was clear?

**Histograms** are used to represent frequency or probability distribution of a given variable (approximate distribution) using vertical bins of equal width (or we can call them bars).

`pyplot`

module has a special function called `hist`

to create a histogram. The first and the only required parameter is our **data** (called `x`

) which can be either an array or a sequence of arrays. If a sequence of arrays is passed, the bins for each array are painted in **different** colors. Here is a simple example for you:

## Intervals and Height

`Series`

object, which contains average yearly temperatures in Seattle, in the `hist()`

function. Our sample was divided into `10`

equal intervals by default starting from the **minimum** value to the **maximum** value. There are, however, only `9`

bins, since there are no values which belong to the second interval.

**frequency** of the values in this interval (number of times they occur).

## Number of Bins

`bins`

which takes either the number of bins (integer) or a sequence of numbers specifying the edges of the bins or a string. Most of the time passing the number of bins is more than enough.

**Sturges' formula** (written in Python): `bins = 1+int(np.log2(n))`

where n is the sample size (the size of the array).

Let’s see it in action:

`DataFrame`

is 26 (the size of the `Series`

), so the resulting number of bins is 5.

## Probability Density Approximation

**probability density** approximation? All we need is to set the parameter `density`

to `True`

.

**total number of values** (the size of the sample) and the **bin width**. As a result, the sum of the areas of the bins will be equal to **1**, which is exactly what we need from a **probability density function**.

Let’s now modify our example:

Now we have an approximation of the probability density function for our temperature data.

`hist()`

function parameters, you can refer to its documentation.

Task

- Use the correct function for creating a histogram.
- Use
`normal_sample`

as the data for the histogram. - Specify the number of bins as the second argument using the Sturges' formula.