course content

Course Content

Probability Theory

Normal DistributionNormal Distribution

Hi there! It is the right time to move to more complex distributions! Continuous one!

What is it?

Continuous distribution is a distribution that has an infinite number of possible outcomes. Therefore, we can not calculate the interval value or create a table because we do not know their amount. Such distributions can be expressed only with a graph.

Let's start with the most widely used and gripping one, normal distribution!

To work with this distribution we should import the norm object from scipy.stats and then we can apply numerous functions to this distribution like sf, cdf, but not pmf. Here is the function with the same meaning titled as pdf.

Examples:

  1. Animals size.
  2. People's heights.
  3. Birth weights.

To understand the key characteristics, it is better to first look at the graph.

Distribution of imperial penguin's heights in meters.

Key characteristics:

The graph is bell-shaped due to the reason that it looks like a bell. The graph is symmetric. It has thin tails.

Graph explanation:

I guess you remember something about mean and standard deviation, so look to the mean, which equals 1.2 meters here, and the standard deviation with the value of 0.3. You can see the most bright yellow rectangle with the value mean + std (standard deviation) as the right border and mean - std (standard deviation) as the left border. The important thing is that all values between the amount mentioned above to 68.3% of all values. The number 68.3% can be called a confidence interval.

The values between mean + 2 * std and mean - 2 * std amount to 95.4% of all values.

The values between mean + 3 * std and mean - 3 * std amount to 99.7% of all values.

Confidence interval:

In our case with a mean of 1.2 and a standard deviation of 0.3 we can say that: 68.3% confidence we can say that the average imperial penguin's heigh is between 1.2 - 0.3 meters and 1.2 + 0.3 meters -> 0.9 and 1.5 meters. 95.4% confidence we can say that the average imperial penguin's heigh is between 1.2 - 2 * 0.3 meters and 1.2 + 2 * 0.3 meters -> 0.6 and 1.8 meters. 99.7% confidence we can say that the average imperial penguin's heigh is between 1.2 - 3 * 0.3 meters and 1.2 + 3 * 0.3 meters -> 0.3 and 2.1 meters.

Let's recall some functions, bit for normal distribution (they are a little bit different):

For outputting random sample: norm.rvs(loc, scale, size).

For calculating the probability of receiving exactly x events: norm.pdf(x, loc, scale).

For calculating the probability of receiving x or more events: norm.sf(x, loc, scale).

For calculating the probability of receiving xor less events: norm.cdf(x, loc, scale).

  • loc is the mean value of the distribution.
  • scale is the standard deviation value of the distribution.
  • size is the number of samples of the distribution.
  • x is the number of expected results.

Task

Here build the random distribution of the cat's weights! Follow the algorithm:

  1. Import norm object from scipy.stats.
  2. Import matplotlib.pyplot with plt alias.
  3. Import seaborn with sns alias.
  4. Generate random normal distribution with the attributes:
    • Mean equals 4.2.
    • Standard deviation equals 1.
  5. Create a histplot with such parameters:
    • dist variable to the data attribute.
    • True variable to the kde attribute.
  6. Output the graph.

Everything was clear?

Section 5. Chapter 1
toggle bottom row