Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Understanding Probability Distributions | Probability & Statistics
Mathematics for Data Science

bookUnderstanding Probability Distributions

Probability distributions

A probability distribution tells you how likely different outcomes are. On the one hand, in discrete outcomes (like "how many defective rods"), we list probabilities for each possible count. For continuous measurements (like length or weight), on the other hand, we describe density across a range. General discrete vs continuous formulas:

P(X∈A)=βˆ‘x∈Ap(x)(discrete)P(a≀X≀b)=∫abf(x)dx(continious)P(X \in A) = \sum_{x \in A}p(x)\quad(\text{discrete}) \\[6pt] P(a \le X \le b) = \int_a^b f(x)dx \quad (continious)

Example (quick check): If a process guarantees all lengths between 49.5 and 50.5 cm are equally likely, the probability a rod lies in a 0.4 cm sub-range will be the sub-range width divided by 1.0 cm (this is the uniform idea β€” below we show it in detail).

Binomial distribution

The binomial models the number of successes (e.g., defective rods) in a fixed number of independent trials (e.g., 100 rods), when each trial has the same probability of success.

Formula:

P(X=k)=(nk)pk(1βˆ’p)nβˆ’kP(X = k) = \begin{pmatrix}n\\k\end{pmatrix}p^k(1-p)^{n-k}

Example:

In a batch of n=100n=100 rods where each rod independently has probability p=0.02p=0.02 of being defective, what is the probability of exactly k=3k=3 defective rods?

Step 1 β€” compute the combination:

(1003)=100!3!97!=161700\begin{pmatrix}100 \\ 3\end{pmatrix} = \frac{100!}{3!97!} = 161700

Step 2 β€” compute powers:

p3=0.023=0.000008(1βˆ’p)97=0.9897β‰ˆ0.1409059532p^3 = 0.02^3 = 0.000008 \\ (1-p)^{97} = 0.98^{97} \approx 0.1409059532

Step 3 β€” multiply all parts:

P(X=3)=161700Γ—0.000008Γ—0.1409059532β‰ˆ0.182275941P(X = 3) = 161700 \times 0.000008 \times 0.1409059532 \approx 0.182275941

What this means: About 18.23% chance of exactly 3 defective rods in a 100-rod sample. If you see 3 defects, that is a plausible outcome.

Note
Note

If your computed probability seems larger than 1 or negative, re-check the combination or the power calculations. Also compare a binomial pmf value to the cdf if you want "at most" or "at least" answers.

Uniform distribution

The uniform distribution models a continuous measurement where every value within a range [a,b] is equally likely (e.g., a tolerance range for rod length).

Formula:

f(x)=1bβˆ’a,a≀x≀bf(x) = \frac{1}{b-a},\quad a \le x \le b

Probability between two points:

P(l≀X≀u)=uβˆ’lbβˆ’aP(l \le X \le u) = \frac{u - l}{b - a}

Example:

Parameters: a=49.5, b=50.5. What is the probability a rod length X lies between 49.8 and 50.2? Compute range width:

bβˆ’a=50.5βˆ’49.5=1.0b-a = 50.5 - 49.5 = 1.0

Compute sub-interval:

uβˆ’l=50.2βˆ’49.8=0.4u - l = 50.2 - 49.8 = 0.4

Probability:

P(49.8≀X≀50.2)=0.41.0=0.4P(49.8 \le X \le 50.2) = \frac{0.4}{1.0} = 0.4

Interpretation: There is a 40% chance a randomly measured rod will fall in this tighter tolerance.

Note
Note

Make sure a<ba<b and your sub-range is inside [a,b][a,b]; otherwise you must clip the endpoints and treat outside ranges with probability 0.

Normal distribution

The normal distribution describes continuous measurements that cluster around a mean ΞΌΞΌ with spread measured by standard deviation σσ. Many measurement errors and natural variations follow this bell-shaped curve.

Formula:

f(x)=1Οƒ2Ο€eβˆ’(xβˆ’ΞΌ)22Οƒ2f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

Standardize with z-score:

z=xβˆ’ΞΌΟƒz = \frac{x-\mu}{\sigma}

Probability between two values uses the cumulative distribution (CDF) or symmetry for standard cases:

P(a≀X≀b)=Ξ¦(bβˆ’ΞΌΟƒ)βˆ’Ξ¦(aβˆ’ΞΌΟƒ)P(a \le X \le b) = \Phi\left(\frac{b-\mu}{\sigma}\right) - \Phi\left(\frac{a-\mu}{\sigma}\right)

Here Ξ¦\Phi is the standard normal CDF.

Example A:

Parameters: ΞΌ=200ΞΌ=200, Οƒ=5Οƒ=5, find P(195≀X≀205)P(195≀X≀205).

Z-scores:

z1=195βˆ’2005=βˆ’1z2=205βˆ’2005=1z_1 = \frac{195 - 200}{5} = -1 \\[6pt] z_2 = \frac{205 - 200}{5} = 1

Using the symmetry of the normal distribution, the probability between βˆ’1βˆ’1 and +1+1 standard deviation is the well-known:

P(195≀X≀205)β‰ˆ0.6826894921P(195 \le X \le 205) \approx 0.6826894921

Interpretation: About 68.27% of rod weights fall within Β±1 standard deviation of the mean β€” a classic "68% rule".

Note
Note

When the bounds are symmetric around use known empirical rules (68–95–99.768–95–99.7). For other bounds, compute then use a table or calculator.

question mark

Z-score for X=195X=195, ΞΌ=200ΞΌ=200, Οƒ=5Οƒ=5?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 5. ChapterΒ 10

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 1.96

bookUnderstanding Probability Distributions

Swipe to show menu

Probability distributions

A probability distribution tells you how likely different outcomes are. On the one hand, in discrete outcomes (like "how many defective rods"), we list probabilities for each possible count. For continuous measurements (like length or weight), on the other hand, we describe density across a range. General discrete vs continuous formulas:

P(X∈A)=βˆ‘x∈Ap(x)(discrete)P(a≀X≀b)=∫abf(x)dx(continious)P(X \in A) = \sum_{x \in A}p(x)\quad(\text{discrete}) \\[6pt] P(a \le X \le b) = \int_a^b f(x)dx \quad (continious)

Example (quick check): If a process guarantees all lengths between 49.5 and 50.5 cm are equally likely, the probability a rod lies in a 0.4 cm sub-range will be the sub-range width divided by 1.0 cm (this is the uniform idea β€” below we show it in detail).

Binomial distribution

The binomial models the number of successes (e.g., defective rods) in a fixed number of independent trials (e.g., 100 rods), when each trial has the same probability of success.

Formula:

P(X=k)=(nk)pk(1βˆ’p)nβˆ’kP(X = k) = \begin{pmatrix}n\\k\end{pmatrix}p^k(1-p)^{n-k}

Example:

In a batch of n=100n=100 rods where each rod independently has probability p=0.02p=0.02 of being defective, what is the probability of exactly k=3k=3 defective rods?

Step 1 β€” compute the combination:

(1003)=100!3!97!=161700\begin{pmatrix}100 \\ 3\end{pmatrix} = \frac{100!}{3!97!} = 161700

Step 2 β€” compute powers:

p3=0.023=0.000008(1βˆ’p)97=0.9897β‰ˆ0.1409059532p^3 = 0.02^3 = 0.000008 \\ (1-p)^{97} = 0.98^{97} \approx 0.1409059532

Step 3 β€” multiply all parts:

P(X=3)=161700Γ—0.000008Γ—0.1409059532β‰ˆ0.182275941P(X = 3) = 161700 \times 0.000008 \times 0.1409059532 \approx 0.182275941

What this means: About 18.23% chance of exactly 3 defective rods in a 100-rod sample. If you see 3 defects, that is a plausible outcome.

Note
Note

If your computed probability seems larger than 1 or negative, re-check the combination or the power calculations. Also compare a binomial pmf value to the cdf if you want "at most" or "at least" answers.

Uniform distribution

The uniform distribution models a continuous measurement where every value within a range [a,b] is equally likely (e.g., a tolerance range for rod length).

Formula:

f(x)=1bβˆ’a,a≀x≀bf(x) = \frac{1}{b-a},\quad a \le x \le b

Probability between two points:

P(l≀X≀u)=uβˆ’lbβˆ’aP(l \le X \le u) = \frac{u - l}{b - a}

Example:

Parameters: a=49.5, b=50.5. What is the probability a rod length X lies between 49.8 and 50.2? Compute range width:

bβˆ’a=50.5βˆ’49.5=1.0b-a = 50.5 - 49.5 = 1.0

Compute sub-interval:

uβˆ’l=50.2βˆ’49.8=0.4u - l = 50.2 - 49.8 = 0.4

Probability:

P(49.8≀X≀50.2)=0.41.0=0.4P(49.8 \le X \le 50.2) = \frac{0.4}{1.0} = 0.4

Interpretation: There is a 40% chance a randomly measured rod will fall in this tighter tolerance.

Note
Note

Make sure a<ba<b and your sub-range is inside [a,b][a,b]; otherwise you must clip the endpoints and treat outside ranges with probability 0.

Normal distribution

The normal distribution describes continuous measurements that cluster around a mean ΞΌΞΌ with spread measured by standard deviation σσ. Many measurement errors and natural variations follow this bell-shaped curve.

Formula:

f(x)=1Οƒ2Ο€eβˆ’(xβˆ’ΞΌ)22Οƒ2f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

Standardize with z-score:

z=xβˆ’ΞΌΟƒz = \frac{x-\mu}{\sigma}

Probability between two values uses the cumulative distribution (CDF) or symmetry for standard cases:

P(a≀X≀b)=Ξ¦(bβˆ’ΞΌΟƒ)βˆ’Ξ¦(aβˆ’ΞΌΟƒ)P(a \le X \le b) = \Phi\left(\frac{b-\mu}{\sigma}\right) - \Phi\left(\frac{a-\mu}{\sigma}\right)

Here Ξ¦\Phi is the standard normal CDF.

Example A:

Parameters: ΞΌ=200ΞΌ=200, Οƒ=5Οƒ=5, find P(195≀X≀205)P(195≀X≀205).

Z-scores:

z1=195βˆ’2005=βˆ’1z2=205βˆ’2005=1z_1 = \frac{195 - 200}{5} = -1 \\[6pt] z_2 = \frac{205 - 200}{5} = 1

Using the symmetry of the normal distribution, the probability between βˆ’1βˆ’1 and +1+1 standard deviation is the well-known:

P(195≀X≀205)β‰ˆ0.6826894921P(195 \le X \le 205) \approx 0.6826894921

Interpretation: About 68.27% of rod weights fall within Β±1 standard deviation of the mean β€” a classic "68% rule".

Note
Note

When the bounds are symmetric around use known empirical rules (68–95–99.768–95–99.7). For other bounds, compute then use a table or calculator.

question mark

Z-score for X=195X=195, ΞΌ=200ΞΌ=200, Οƒ=5Οƒ=5?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 5. ChapterΒ 10
some-alt