Conteúdo do Curso
Probability Theory Basics
Probability Theory Basics
What is Covariance?
Covariance is a numerical measure that quantifies the relationship between two variables.
It measures how changes in one variable correspond to changes in another variable. More specifically, covariance measures the joint variability of two variables and provides insights into the direction (positive or negative) of this variability.
Covariance calculation
- Conduct the first stochastic experiment several times and write the results of each experiment to the array. It will be an
x
array; - Conduct the second stochastic experiment several times and write the results to the
y
array; - Calculate covariance using the
numpy
library:covariance = np.cov(x, y)[0, 1]
.
Examples
import numpy as np import matplotlib.pyplot as plt # Assume that results of some stochastic experiments are stored in x array x = np.random.rand(100) * 10 # We provide another stochastic experiment by using the value of x and adding some noise y = x + np.random.randn(100) # Calculate the covariance covariance = np.cov(x, y)[0, 1] plt.scatter(x, y) # Add labels and title plt.xlabel('X') plt.ylabel('Y') plt.title('Covariance is '+ str(round(covariance, 3) )) # Show the plot plt.show()
We see that as the value of x
increases, the value of y
also increases. The correlation is, therefore, positive. Let's provide another experiment:
import numpy as np import matplotlib.pyplot as plt # Assume that resylts of some stohastic experiments are stored in x array x = np.random.rand(100) * 10 # We provide another stohastic experiment by using the value of -x and adding some noise y = -x + np.random.randn(100) # Calculate the covariance covariance = np.cov(x, y)[0, 1] plt.scatter(x, y) # Add labels and title plt.xlabel('X') plt.ylabel('Y') plt.title('Covariance is '+ str(round(covariance, 3) )) # Show the plot plt.show()
Now while the x
value increases, the y
value decreases and the covariance is negative. Now let's look at the covariation between the results of two independent experiments:
import numpy as np import matplotlib.pyplot as plt # Generate random data for two variables with zero correlation np.random.seed(0) x = np.random.rand(200) y = np.random.rand(200) # Calculate the covariance covariance = np.cov(x, y)[0, 1] plt.scatter(x, y) # Add labels and title plt.xlabel('X') plt.ylabel('Y') plt.title('Covariance is '+ str(round(covariance, 3) )) # Show the plot plt.show()
As a result, we can make a conclusion:
- If the covariance between two values is positive then with increasing of the first value the second value also increases;
- If the covariance between two values is negative then with increasing of the first value the second value decreases;
- If values are independent then they have zero correlation (are uncorrelated).
Pay attention to the last point: the correlation is zero if the values are independent. But the converse is not true: if the correlation is zero, this does not mean independence. Look at the example:
import numpy as np import matplotlib.pyplot as plt # Set the number of vectors/points to generate num_points = 1000 # Generate random angles uniformly distributed between 0 and 2*pi angles = np.random.uniform(0, 2*np.pi, num_points) # Convert angles to vectors in polar coordinates r = np.sqrt(np.random.uniform(0, 1, num_points)) # Square root to achieve uniform distribution within the circle x = r * np.cos(angles) y = r * np.sin(angles) # Calculate the covariance covariance = np.cov(x, y)[0, 1] plt.scatter(x, y) # Add labels and title plt.xlabel('X') plt.ylabel('Y') plt.title('Covariance is '+ str(round(covariance, 3) )) # Show the plot plt.show()
The points in the example above lie inside the unit circle and are therefore dependent but uncorrelated.
In general, only linear relationships between values can be well identified with the help of covariance. Thus, in the case of uncorrelated values, we can conclude that they do not have linear dependencies, but may have other more complex types of dependencies.
Obrigado pelo seu feedback!