Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Correlation and Covariance | Section
Applying Statistical Methods

bookCorrelation and Covariance

メニューを表示するにはスワイプしてください

Understanding how variables relate to each other is essential in statistics. Correlation and covariance are two fundamental measures that help you quantify these relationships. Both measure the direction of a linear relationship between two variables, but they differ in scale and interpretation.

Covariance

  • Indicates whether two variables move together;
  • Positive covariance: both variables increase or decrease together;
  • Negative covariance: one variable increases while the other decreases;
  • Magnitude depends on the scale of the variables, making comparison across datasets difficult.

Correlation

  • Standardizes the relationship by dividing the covariance by the product of the variables' standard deviations;
  • Results in a value between -1 and 1;
  • 1 means a perfect positive linear relationship;
  • -1 means a perfect negative linear relationship;
  • 0 means no linear relationship;
  • Unitless, so it is easier to interpret and compare.

Calculating Correlation and Covariance in Python

You can use libraries like numpy and pandas to calculate both covariance and correlation:

  • The cov() function in pandas computes the covariance matrix;
  • The corr() function gives you the correlation matrix.

These matrices show the pairwise relationships between all variables in your dataset.

Interpreting Results

  • A high correlation does not imply causation;
  • Outliers can impact both measures;
  • Correlation measures only linear relationships and may miss nonlinear associations.
123456789101112131415161718192021
import numpy as np import pandas as pd # Realistic dataset: Student statistics data = { "hours_studied": [2, 3, 5, 7, 9, 1, 4, 8, 6, 10], "exam_score": [55, 60, 75, 85, 95, 50, 70, 90, 80, 98], "social_media": [6, 5, 4, 2, 1, 7, 5, 2, 3, 0], "hours_slept": [7, 6, 8, 7, 6, 8, 7, 6, 8, 7] } df = pd.DataFrame(data) # Compute covariance matrix cov_matrix = df.cov() print("Covariance matrix:") print(cov_matrix) # Compute correlation matrix corr_matrix = df.corr() print("\nCorrelation matrix:") print(corr_matrix)
copy
question mark

Which of the following statements about correlation coefficients is correct?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  7

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 1.  7
some-alt