Summary  
This chapter demonstrates how to compute covariance between two numerical variables in Python using NumPy’s np.cov() function to assess the direction of their linear relationship.

General domain of usage  
Retail analytics

**Covariance** is a measure of the joint variability of two random variables.

Definition

The formulas for **sample** and **population** covariance differ, but they will not be discussed in detail here. This chapter focuses on calculating the covariance for the following dataset:


- `Store_ID`: the unique id of the store;
- `Store_Area`: the area of the store;
- `Items_Available`: the number of items that are available in the store;
- `Daily_Customer_Count`: the daily number of customers in the store;
- `Store_Sales`: the number of sales in the store.

## Calculating Covariance with Python

To compute covariance in Python, use the `np.cov()` function from the **NumPy** library. It takes two parameters: the data sequences for which you want to calculate the covariance.

The result is the value at index `[0,1]`. This course won't cover the other values in the output, refer to the example:

import pandas as pd 
import numpy as np

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/Stores.csv')

# Calculating covariance 
cov = np.cov(df['Store_Area'], df['Items_Available'])[0,1]

print(round(cov, 2))

This indicates that the values move in the same direction. This makes sense because a larger store area corresponds to a greater number of items. One significant drawback of covariance is that the value can be infinite.

Which statements about covariance are correct?

Learn core statistical concepts used in data analysis with Python. The course covers descriptive statistics, including mean, median, mode, variance, and standard deviation, as well as sampling, probability distributions, the Central Limit Theorem, and outlier detection.