Course Content
Learning Statistics with Python
2. Mean, Median and Mode with Python
3. Variance and Standard Deviation
4. Covariance vs Correlation
Learning Statistics with Python
Covariance
Covariance is a measure of the joint variability of two random variables.
The value of covariance | Meaning |
Positive | Two variables move in one direction together |
0 | Two variables don't vary together |
Negative | Two variables move in an opposite directions together |
The formulas are different for the sample and population, but we will not dive deeper into them. In this chapter, we will discuss covariances of the following dataset:
Store_ID | Store_Area | Items_Available | Daily_Customer_Count | Store_Sales | |
0 | 0 | 1659 | 1961 | 530 | 66490 |
1 | 1 | 1461 | 1752 | 210 | 39820 |
2 | 2 | 1340 | 1609 | 720 | 54010 |
3 | 3 | 1451 | 1748 | 620 | 53730 |
4 | 4 | 1770 | 2111 | 450 | 46620 |
Store_ID
- The unique id of the store.Store_Area
- The area of the store.Items_Available
- The number of items that are available in the store.Daily_Customer_Count
- The daily number of customers in the store.Store_Sales
- The number of sales in the store.
Covariance with Python:
To work with covariance in Python, we need to use the function np.cov()
, from the NumPy library, with two parameters: the sequences of data between which we want to find covariance.
The output is the number with the index [0,1], we will not learn the other values from the output within this course; look at the example:
It means that the values are moving in one direction. It makes sense because the bigger the store area, the bigger the number of items. The significant disadvantage of the covariance is that the value can be infinite.
Everything was clear?