Course Content
Learning Statistics with Python
Learning Statistics with Python
Covariance
Covariance is a measure of the joint variability of two random variables.
The value of covariance | Meaning |
Positive | Two variables move in the same direction |
0 | Two variables no linear relationship |
Negative | Two variables move in opposite directions |
The formulas are different for the sample and population, but we will not dive deeper into them. In this chapter, we will discuss covariances of the following dataset:
Store_ID | Store_Area | Items_Available | Daily_Customer_Count | Store_Sales | |
0 | 0 | 1659 | 1961 | 530 | 66490 |
1 | 1 | 1461 | 1752 | 210 | 39820 |
2 | 2 | 1340 | 1609 | 720 | 54010 |
3 | 3 | 1451 | 1748 | 620 | 53730 |
4 | 4 | 1770 | 2111 | 450 | 46620 |
Store_ID
- The unique id of the store;Store_Area
- The area of the store;Items_Available
- The number of items that are available in the store;Daily_Customer_Count
- The daily number of customers in the store;Store_Sales
- The number of sales in the store.
Calculating Covariance with Python:
To compute covariance in Python, you can use the np.cov()
function from the NumPy library. It requires two parameters: the sequences of data for which you want to calculate the covariance.
The result is the value at index [0,1]. This course won't cover the other values in the output, refer to the example:
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/Stores.csv') # Calculating covariance cov = np.cov(df['Store_Area'], df['Items_Available'])[0,1] print(round(cov, 2))
This indicates that the values move in the same direction. This makes sense because a larger store area corresponds to a greater number of items. One significant drawback of covariance is that the value can be infinite.
Thanks for your feedback!