Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Histogram | More Statistical Plots
Quizzes & Challenges
Quizzes
Challenges
/
Ultimate Visualization with Python

bookHistogram

Note
Definition

Histograms represent the frequency or probability distribution of a variable by using vertical bins of equal width, often referred to as bars.

The pyplot module provides the hist function to create histograms. The required parameter is the data (x), which can be an array or a sequence of arrays. If multiple arrays are passed, each is shown in a different color.

12345678910
import pandas as pd import matplotlib.pyplot as plt # Loading the dataset with the average yearly temperatures in Boston and Seattle url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating a histogram plt.hist(weather_df['Seattle']) plt.show()
copy

Intervals and Height

A Series of yearly Seattle temperatures was passed to hist(). By default, the data is split into 10 equal intervals between the minimum and maximum. Only 9 bins appear because one interval contains no values.

Bin height shows the frequency β€” how many data points fall into each interval.

Number of Bins

The optional bins parameter can be an integer (number of bins), a sequence of edges, or a string. Usually, specifying the number of bins is sufficient.

One common rule for choosing the number of bins is Sturges’ formula, based on sample size:

bins = 1 + int(np.log2(n))

where n is the dataset size.

Note
Study More

You can explore additional methods for bin calculation here.

12345678910
import pandas as pd import matplotlib.pyplot as plt import numpy as np url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Specifying the number of bins plt.hist(weather_df['Seattle'], bins=1 + int(np.log2(len(weather_df)))) plt.show()
copy

The number of rows in the DataFrame is 26 (the size of the Series), so the resulting number of bins is 5.

Probability Density Approximation

To approximate a probability density, set density=True in hist(). Then each bin height is:

Height=mnβ‹…w\text{Height} = \frac{m}{n \cdot w}

where:

  • ( n ) β€” total number of values,
  • ( m ) β€” count in the bin,
  • ( w ) β€” bin width.

This makes the total area of the histogram equal to 1, matching the behavior of a PDF.

12345678910
import pandas as pd import matplotlib.pyplot as plt import numpy as np url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Making a histogram a probability density function approximation plt.hist(weather_df['Seattle'], bins=1 + int(np.log2(len(weather_df))), density=True) plt.show()
copy

This provides an approximation of the probability density function for the temperature data.

Note
Study More

If you want to explore more about the hist() parameters, you can refer to hist() documentation.

Task

Swipe to start coding

Create an approximation of a probability density function using a sample from the standard normal distribution:

  1. Use the correct function for creating a histogram.
  2. Use normal_sample as the data for the histogram.
  3. Specify the number of bins as the second argument using the Sturges' formula.
  4. Make the histogram an approximation of a probability density function via correctly specifying the rightmost argument.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 1
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

bookHistogram

Swipe to show menu

Note
Definition

Histograms represent the frequency or probability distribution of a variable by using vertical bins of equal width, often referred to as bars.

The pyplot module provides the hist function to create histograms. The required parameter is the data (x), which can be an array or a sequence of arrays. If multiple arrays are passed, each is shown in a different color.

12345678910
import pandas as pd import matplotlib.pyplot as plt # Loading the dataset with the average yearly temperatures in Boston and Seattle url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Creating a histogram plt.hist(weather_df['Seattle']) plt.show()
copy

Intervals and Height

A Series of yearly Seattle temperatures was passed to hist(). By default, the data is split into 10 equal intervals between the minimum and maximum. Only 9 bins appear because one interval contains no values.

Bin height shows the frequency β€” how many data points fall into each interval.

Number of Bins

The optional bins parameter can be an integer (number of bins), a sequence of edges, or a string. Usually, specifying the number of bins is sufficient.

One common rule for choosing the number of bins is Sturges’ formula, based on sample size:

bins = 1 + int(np.log2(n))

where n is the dataset size.

Note
Study More

You can explore additional methods for bin calculation here.

12345678910
import pandas as pd import matplotlib.pyplot as plt import numpy as np url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Specifying the number of bins plt.hist(weather_df['Seattle'], bins=1 + int(np.log2(len(weather_df)))) plt.show()
copy

The number of rows in the DataFrame is 26 (the size of the Series), so the resulting number of bins is 5.

Probability Density Approximation

To approximate a probability density, set density=True in hist(). Then each bin height is:

Height=mnβ‹…w\text{Height} = \frac{m}{n \cdot w}

where:

  • ( n ) β€” total number of values,
  • ( m ) β€” count in the bin,
  • ( w ) β€” bin width.

This makes the total area of the histogram equal to 1, matching the behavior of a PDF.

12345678910
import pandas as pd import matplotlib.pyplot as plt import numpy as np url = 'https://content-media-cdn.codefinity.com/courses/47339f29-4722-4e72-a0d4-6112c70ff738/weather_data.csv' weather_df = pd.read_csv(url, index_col=0) # Making a histogram a probability density function approximation plt.hist(weather_df['Seattle'], bins=1 + int(np.log2(len(weather_df))), density=True) plt.show()
copy

This provides an approximation of the probability density function for the temperature data.

Note
Study More

If you want to explore more about the hist() parameters, you can refer to hist() documentation.

Task

Swipe to start coding

Create an approximation of a probability density function using a sample from the standard normal distribution:

  1. Use the correct function for creating a histogram.
  2. Use normal_sample as the data for the histogram.
  3. Specify the number of bins as the second argument using the Sturges' formula.
  4. Make the histogram an approximation of a probability density function via correctly specifying the rightmost argument.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 1
single

single

some-alt