Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Derivation of Whitening Transformation | Whitening and Decorrelation
Feature Scaling and Normalization Deep Dive

bookDerivation of Whitening Transformation

To transform a dataset so that its features are uncorrelated and each has unit variance, you use a process called whitening. Whitening is especially useful in machine learning and statistics, as many algorithms assume that input features are uncorrelated and standardized. The whitening transformation can be derived step by step using eigenvalue decomposition of the data's covariance matrix.

Suppose your data matrix XX is of shape (nsamples,nfeatures)(n_samples, n_features) and is already mean-centered (each feature has mean zero). The first step is to compute the covariance matrix ΣΣ (sigma) of the data. This is given by:

Σ=1nX⊀X\Sigma = \frac{1}{n} X^\top X

where nn is the number of samples.

Next, you perform eigenvalue decomposition on the covariance matrix ΣΣ. This means you find a matrix of eigenvectors UU and a diagonal matrix of eigenvalues ΛΛ (lambda) such that:

Ξ£=UΞ›U⊀\Sigma = U \Lambda U^\top

The matrix UU contains the eigenvectors as columns, and ΛΛ is a diagonal matrix with the corresponding eigenvalues.

The whitening transformation seeks a matrix WW such that the transformed data Xwhite=XWX_{white} = X W has a covariance matrix equal to the identity matrix (i.e., uncorrelated features with unit variance). The transformation matrix WW is derived as follows:

  1. Rotate the data using the eigenvectors: XUX U;
  2. Scale each component by the inverse square root of its eigenvalue: XUΞ›βˆ’1/2X U Ξ›^{-1/2}.

Thus, the full whitening transformation is:

Xwhite=XUΞ›βˆ’1/2U⊀X_{white} = X U \Lambda^{-1/2} U^\top

Here, Ξ›βˆ’1/2Ξ›^{-1/2} means taking the reciprocal of the square root of each eigenvalue on the diagonal of ΛΛ. This operation scales each principal component to have unit variance.

In summary, the whitening transformation is constructed by first diagonalizing the covariance matrix via eigenvalue decomposition, then scaling the principal components, and finally rotating the data back to the original feature space. This ensures that the resulting features are both uncorrelated and standardized.

Note
Definition

Whitening transforms data so that all features are uncorrelated (covariances are zero) and each has unit variance (variance equals one). This is achieved by scaling and rotating the data based on the eigenvalues and eigenvectors of its covariance matrix.

12345678910111213141516171819202122232425262728
import numpy as np # Generate a simple 2D mean-centered dataset X = np.array([ [2.0, 0.0], [0.0, 2.0], [-2.0, 0.0], [0.0, -2.0] ]) # Step 1: Compute the covariance matrix cov = np.cov(X, rowvar=False) # Step 2: Eigenvalue decomposition eigvals, eigvecs = np.linalg.eigh(cov) # Step 3: Construct the whitening matrix # Lambda^{-1/2}: inverse square root of eigenvalues D_inv_sqrt = np.diag(1.0 / np.sqrt(eigvals)) # Whitening matrix: eigvecs @ D_inv_sqrt @ eigvecs.T whitening_matrix = eigvecs @ D_inv_sqrt @ eigvecs.T # Step 4: Apply the whitening transformation X_white = X @ whitening_matrix print("Whitened data:\n", X_white) print("Covariance of whitened data:\n", np.cov(X_white, rowvar=False))
copy
question mark

What is the main purpose of applying whitening to a dataset in feature scaling and normalization?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 2

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain why whitening is important in machine learning?

How does eigenvalue decomposition help in the whitening process?

Can you walk me through the steps of the whitening code example?

Awesome!

Completion rate improved to 5.26

bookDerivation of Whitening Transformation

Swipe to show menu

To transform a dataset so that its features are uncorrelated and each has unit variance, you use a process called whitening. Whitening is especially useful in machine learning and statistics, as many algorithms assume that input features are uncorrelated and standardized. The whitening transformation can be derived step by step using eigenvalue decomposition of the data's covariance matrix.

Suppose your data matrix XX is of shape (nsamples,nfeatures)(n_samples, n_features) and is already mean-centered (each feature has mean zero). The first step is to compute the covariance matrix ΣΣ (sigma) of the data. This is given by:

Σ=1nX⊀X\Sigma = \frac{1}{n} X^\top X

where nn is the number of samples.

Next, you perform eigenvalue decomposition on the covariance matrix ΣΣ. This means you find a matrix of eigenvectors UU and a diagonal matrix of eigenvalues ΛΛ (lambda) such that:

Ξ£=UΞ›U⊀\Sigma = U \Lambda U^\top

The matrix UU contains the eigenvectors as columns, and ΛΛ is a diagonal matrix with the corresponding eigenvalues.

The whitening transformation seeks a matrix WW such that the transformed data Xwhite=XWX_{white} = X W has a covariance matrix equal to the identity matrix (i.e., uncorrelated features with unit variance). The transformation matrix WW is derived as follows:

  1. Rotate the data using the eigenvectors: XUX U;
  2. Scale each component by the inverse square root of its eigenvalue: XUΞ›βˆ’1/2X U Ξ›^{-1/2}.

Thus, the full whitening transformation is:

Xwhite=XUΞ›βˆ’1/2U⊀X_{white} = X U \Lambda^{-1/2} U^\top

Here, Ξ›βˆ’1/2Ξ›^{-1/2} means taking the reciprocal of the square root of each eigenvalue on the diagonal of ΛΛ. This operation scales each principal component to have unit variance.

In summary, the whitening transformation is constructed by first diagonalizing the covariance matrix via eigenvalue decomposition, then scaling the principal components, and finally rotating the data back to the original feature space. This ensures that the resulting features are both uncorrelated and standardized.

Note
Definition

Whitening transforms data so that all features are uncorrelated (covariances are zero) and each has unit variance (variance equals one). This is achieved by scaling and rotating the data based on the eigenvalues and eigenvectors of its covariance matrix.

12345678910111213141516171819202122232425262728
import numpy as np # Generate a simple 2D mean-centered dataset X = np.array([ [2.0, 0.0], [0.0, 2.0], [-2.0, 0.0], [0.0, -2.0] ]) # Step 1: Compute the covariance matrix cov = np.cov(X, rowvar=False) # Step 2: Eigenvalue decomposition eigvals, eigvecs = np.linalg.eigh(cov) # Step 3: Construct the whitening matrix # Lambda^{-1/2}: inverse square root of eigenvalues D_inv_sqrt = np.diag(1.0 / np.sqrt(eigvals)) # Whitening matrix: eigvecs @ D_inv_sqrt @ eigvecs.T whitening_matrix = eigvecs @ D_inv_sqrt @ eigvecs.T # Step 4: Apply the whitening transformation X_white = X @ whitening_matrix print("Whitened data:\n", X_white) print("Covariance of whitened data:\n", np.cov(X_white, rowvar=False))
copy
question mark

What is the main purpose of applying whitening to a dataset in feature scaling and normalization?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 2
some-alt