Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Covariance, Correlation, and Decorrelation | Whitening and Decorrelation
Feature Scaling and Normalization Deep Dive

bookCovariance, Correlation, and Decorrelation

Understanding how features relate to each other is crucial when preparing data for machine learning. Two fundamental concepts in this context are covariance and correlation. Both measure how two variables change together, but they do so in different ways.

The covariance between two random variables, XX and YY, is mathematically defined as:

Cov(X,Y)=1nβˆ’1βˆ‘i=1n(xiβˆ’xΛ‰)(yiβˆ’yΛ‰)\text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})

where xix_i and yiy_i are the individual sample values, and xˉ\bar{x} and yˉ\bar{y} are the sample means. Covariance indicates the direction of the linear relationship between variables. A positive value means that as one variable increases, the other tends to increase as well; a negative value means the opposite. However, the magnitude of covariance is not standardized, making it hard to compare across different variable pairs.

The correlation coefficient, often called Pearson's correlation, standardizes this measure:

Corr(X,Y)=Cov(X,Y)σXσY\text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}

where ΟƒX\sigma_X and ΟƒY\sigma_Y are the standard deviations of XX and YY. Correlation values always range from -1 to 1, making them easier to interpret. A value of 1 means perfect positive linear relationship, -1 means perfect negative linear relationship, and 0 means no linear relationship.

Suppose you have two features, height and weight, measured for a group of people. If taller individuals tend to be heavier, the covariance and correlation between height and weight will both be positive. However, if you compare height and a completely unrelated feature, such as shoe size in a dataset where all shoes are the same size, the covariance and correlation would be close to zero.

Note
Note

When two features are highly correlated, they contain overlapping information. This redundancy can confuse machine learning models, especially those that assume features are independent. Decorrelation aims to transform features so that they are statistically independent or at least uncorrelated. This often improves model performance, reduces overfitting, and speeds up learning.

123456789101112131415161718
import numpy as np # Small dataset: 3 samples, 2 features data = np.array([ [2.0, 8.0], [4.0, 10.0], [6.0, 14.0] ]) # Compute covariance matrix (features in columns) cov_matrix = np.cov(data, rowvar=False) print("Covariance matrix:") print(cov_matrix) # Compute correlation matrix corr_matrix = np.corrcoef(data, rowvar=False) print("\nCorrelation matrix:") print(corr_matrix)
copy
question mark

Which statement best describes what a correlation matrix tells you about your dataset?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain why the correlation matrix values are all 1 in this example?

What does the covariance matrix tell us about the relationship between the features?

How would the results change if the features were not perfectly linearly related?

Awesome!

Completion rate improved to 5.26

bookCovariance, Correlation, and Decorrelation

Swipe to show menu

Understanding how features relate to each other is crucial when preparing data for machine learning. Two fundamental concepts in this context are covariance and correlation. Both measure how two variables change together, but they do so in different ways.

The covariance between two random variables, XX and YY, is mathematically defined as:

Cov(X,Y)=1nβˆ’1βˆ‘i=1n(xiβˆ’xΛ‰)(yiβˆ’yΛ‰)\text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})

where xix_i and yiy_i are the individual sample values, and xˉ\bar{x} and yˉ\bar{y} are the sample means. Covariance indicates the direction of the linear relationship between variables. A positive value means that as one variable increases, the other tends to increase as well; a negative value means the opposite. However, the magnitude of covariance is not standardized, making it hard to compare across different variable pairs.

The correlation coefficient, often called Pearson's correlation, standardizes this measure:

Corr(X,Y)=Cov(X,Y)σXσY\text{Corr}(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}

where ΟƒX\sigma_X and ΟƒY\sigma_Y are the standard deviations of XX and YY. Correlation values always range from -1 to 1, making them easier to interpret. A value of 1 means perfect positive linear relationship, -1 means perfect negative linear relationship, and 0 means no linear relationship.

Suppose you have two features, height and weight, measured for a group of people. If taller individuals tend to be heavier, the covariance and correlation between height and weight will both be positive. However, if you compare height and a completely unrelated feature, such as shoe size in a dataset where all shoes are the same size, the covariance and correlation would be close to zero.

Note
Note

When two features are highly correlated, they contain overlapping information. This redundancy can confuse machine learning models, especially those that assume features are independent. Decorrelation aims to transform features so that they are statistically independent or at least uncorrelated. This often improves model performance, reduces overfitting, and speeds up learning.

123456789101112131415161718
import numpy as np # Small dataset: 3 samples, 2 features data = np.array([ [2.0, 8.0], [4.0, 10.0], [6.0, 14.0] ]) # Compute covariance matrix (features in columns) cov_matrix = np.cov(data, rowvar=False) print("Covariance matrix:") print(cov_matrix) # Compute correlation matrix corr_matrix = np.corrcoef(data, rowvar=False) print("\nCorrelation matrix:") print(corr_matrix)
copy
question mark

Which statement best describes what a correlation matrix tells you about your dataset?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 1
some-alt