Summary  
This chapter demonstrates how to compute measures of spread—mean, population and sample variance, and standard deviation—using NumPy and how to visualize the data distribution with a histogram and lines marking the mean and standard deviation.

General domain of usage  
Business sales data analysis

Download Code from a Video

## Define the Dataset

Here, we assign an array to the variable `data` to ensure we have a consistent dataset to work with for all calculations.

```python
import numpy as np

# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])
```

## Calculate Population Statistics

This function takes the array as input and returns the average value of all elements, which summarizes the central tendency of the dataset.

```python
mean_val = np.mean(data)       # Mean
variance_val = np.var(data)    # Population variance (ddof=0 by default)
std_dev_val = np.std(data)     # Population standard deviation
```

* `np.mean(data)` computes the arithmetic mean (average);
* `np.var(data)` calculates the **population variance** (divides by $$n$$);
* `np.std(data)` calculates the **population standard deviation** (square root of variance).

import numpy as np

# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])

mean_val = np.mean(data)       # Mean
variance_val = np.var(data)    # Population variance (ddof=0 by default)
std_dev_val = np.std(data)     # Population standard deviation

print(f"Mean: {mean_val}")
print(f"Variance (Population): {variance_val}")
print(f"Standard Deviation (Population): {std_dev_val}")

## Calculate Sample Statistics

To get **unbiased estimates** from a sample, we use `ddof=1`.
This applies **Bessel's correction**, dividing variance by $(n-1)$ instead of $n$.

```python
sample_variance_val = np.var(data, ddof=1)
sample_std_dev_val = np.std(data, ddof=1)
```

* `np.var(data, ddof=1)` - sample variance;
* `np.std(data, ddof=1)` - sample standard deviation.

import numpy as np

# Create a numpy array of daily sales
data = np.array([10, 15, 12, 18, 20, 22, 14, 17, 11, 16])

sample_variance_val = np.var(data, ddof=1)
sample_std_dev_val = np.std(data, ddof=1)

print(f"Variance (Sample): {sample_variance_val}")
print(f"Standard Deviation (Sample): {sample_std_dev_val}")

Standard deviation is the square root of variance, giving a measure of spread in the **same units as the original data**, making it easier to interpret.

Note

How do we calculate standard deviation with `numpy` library?

Master the mathematical foundations essential for data science. Explore core concepts in functions, calculus, linear algebra, probability, and dimensionality reduction. Build both theoretical understanding and practical coding experience to strengthen your ability to analyze data, model complex systems, and apply advanced techniques in machine learning.

Explore the foundation of mathematical functions. Learn different types of algebraic and transcendental functions, their properties, and how to implement them in Python to solve real-world problems.

Master the concepts of sets and series, from basic operations to practical applications. Gain hands-on experience implementing set operations and working with arithmetic and geometric series in Python.

Develop a solid understanding of limits, derivatives, integrals, and partial derivatives. Connect theory to practice by implementing these concepts in Python and applying them to optimization through gradient descent.

Build strong knowledge of vectors, matrices, and transformations. Learn decomposition methods and eigenvalue analysis, while reinforcing concepts with Python coding challenges and practical data science applications.

Dive into probability theory and statistics. Study conditional probability, Bayes' theorem, and statistical measures. Implement key concepts in Python, simulate distributions, and strengthen your skills through challenges and quizzes.

Implementing Spread in Python

Define the Dataset

Calculate Population Statistics

Calculate Sample Statistics