Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Correlation and Covariance | Section
Applying Statistical Methods

bookCorrelation and Covariance

Sveip for å vise menyen

Understanding how variables relate to each other is essential in statistics. Correlation and covariance are two fundamental measures that help you quantify these relationships. Both measure the direction of a linear relationship between two variables, but they differ in scale and interpretation.

Covariance

  • Indicates whether two variables move together;
  • Positive covariance: both variables increase or decrease together;
  • Negative covariance: one variable increases while the other decreases;
  • Magnitude depends on the scale of the variables, making comparison across datasets difficult.

Correlation

  • Standardizes the relationship by dividing the covariance by the product of the variables' standard deviations;
  • Results in a value between -1 and 1;
  • 1 means a perfect positive linear relationship;
  • -1 means a perfect negative linear relationship;
  • 0 means no linear relationship;
  • Unitless, so it is easier to interpret and compare.

Calculating Correlation and Covariance in Python

You can use libraries like numpy and pandas to calculate both covariance and correlation:

  • The cov() function in pandas computes the covariance matrix;
  • The corr() function gives you the correlation matrix.

These matrices show the pairwise relationships between all variables in your dataset.

Interpreting Results

  • A high correlation does not imply causation;
  • Outliers can impact both measures;
  • Correlation measures only linear relationships and may miss nonlinear associations.
123456789101112131415161718192021
import numpy as np import pandas as pd # Realistic dataset: Student statistics data = { "hours_studied": [2, 3, 5, 7, 9, 1, 4, 8, 6, 10], "exam_score": [55, 60, 75, 85, 95, 50, 70, 90, 80, 98], "social_media": [6, 5, 4, 2, 1, 7, 5, 2, 3, 0], "hours_slept": [7, 6, 8, 7, 6, 8, 7, 6, 8, 7] } df = pd.DataFrame(data) # Compute covariance matrix cov_matrix = df.cov() print("Covariance matrix:") print(cov_matrix) # Compute correlation matrix corr_matrix = df.corr() print("\nCorrelation matrix:") print(corr_matrix)
copy
question mark

Which of the following statements about correlation coefficients is correct?

Velg det helt riktige svaret

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 1. Kapittel 7

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Seksjon 1. Kapittel 7
some-alt