Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
What is Covariance? | Covariance and Correlation
Probability Theory Basics
course content

Contenido del Curso

Probability Theory Basics

Probability Theory Basics

1. Basic Concepts of Probability Theory
2. Probability of Complex Events
3. Commonly Used Discrete Distributions
4. Commonly Used Continuous Distributions
5. Covariance and Correlation

bookWhat is Covariance?

Covariance is a numerical measure that quantifies the relationship between two variables.
It measures how changes in one variable correspond to changes in another variable. More specifically, covariance measures the joint variability of two variables and provides insights into the direction (positive or negative) of this variability.

Covariance calculation

  1. Conduct the first stochastic experiment several times and write the results of each experiment to the array. It will be an x array;
  2. Conduct the second stochastic experiment several times and write the results to the y array;
  3. Calculate covariance using the numpy library: covariance = np.cov(x, y)[0, 1].

Examples

12345678910111213141516171819
import numpy as np import matplotlib.pyplot as plt # Assume that results of some stochastic experiments are stored in x array x = np.random.rand(100) * 10 # We provide another stochastic experiment by using the value of x and adding some noise y = x + np.random.randn(100) # Calculate the covariance covariance = np.cov(x, y)[0, 1] plt.scatter(x, y) # Add labels and title plt.xlabel('X') plt.ylabel('Y') plt.title('Covariance is '+ str(round(covariance, 3) )) # Show the plot plt.show()
copy

We see that as the value of x increases, the value of y also increases. The correlation is, therefore, positive. Let's provide another experiment:

12345678910111213141516171819
import numpy as np import matplotlib.pyplot as plt # Assume that resylts of some stohastic experiments are stored in x array x = np.random.rand(100) * 10 # We provide another stohastic experiment by using the value of -x and adding some noise y = -x + np.random.randn(100) # Calculate the covariance covariance = np.cov(x, y)[0, 1] plt.scatter(x, y) # Add labels and title plt.xlabel('X') plt.ylabel('Y') plt.title('Covariance is '+ str(round(covariance, 3) )) # Show the plot plt.show()
copy

Now while the x value increases, the y value decreases and the covariance is negative. Now let's look at the covariation between the results of two independent experiments:

1234567891011121314151617181920
import numpy as np import matplotlib.pyplot as plt # Generate random data for two variables with zero correlation np.random.seed(0) x = np.random.rand(200) y = np.random.rand(200) # Calculate the covariance covariance = np.cov(x, y)[0, 1] plt.scatter(x, y) # Add labels and title plt.xlabel('X') plt.ylabel('Y') plt.title('Covariance is '+ str(round(covariance, 3) )) # Show the plot plt.show()
copy

As a result, we can make a conclusion:

  1. If the covariance between two values is positive then with increasing of the first value the second value also increases;
  2. If the covariance between two values is negative then with increasing of the first value the second value decreases;
  3. If values are independent then they have zero correlation (are uncorrelated).

Pay attention to the last point: the correlation is zero if the values ​​are independent. But the converse is not true: if the correlation is zero, this does not mean independence. Look at the example:

1234567891011121314151617181920212223242526
import numpy as np import matplotlib.pyplot as plt # Set the number of vectors/points to generate num_points = 1000 # Generate random angles uniformly distributed between 0 and 2*pi angles = np.random.uniform(0, 2*np.pi, num_points) # Convert angles to vectors in polar coordinates r = np.sqrt(np.random.uniform(0, 1, num_points)) # Square root to achieve uniform distribution within the circle x = r * np.cos(angles) y = r * np.sin(angles) # Calculate the covariance covariance = np.cov(x, y)[0, 1] plt.scatter(x, y) # Add labels and title plt.xlabel('X') plt.ylabel('Y') plt.title('Covariance is '+ str(round(covariance, 3) )) # Show the plot plt.show()
copy

The points in the example above lie inside the unit circle and are therefore dependent but uncorrelated.
In general, only linear relationships between values ​​can be well identified with the help of covariance. Thus, in the case of uncorrelated values, we can conclude that they do not have linear dependencies, but may have other more complex types of dependencies.

Which of the following statements is true?

Which of the following statements is true?

Selecciona la respuesta correcta

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 5. Capítulo 1
some-alt