Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Correlation and Covariance | Section
Applying Statistical Methods

bookCorrelation and Covariance

Deslize para mostrar o menu

Understanding how variables relate to each other is essential in statistics. Correlation and covariance are two fundamental measures that help you quantify these relationships. Both measure the direction of a linear relationship between two variables, but they differ in scale and interpretation.

Covariance

  • Indicates whether two variables move together;
  • Positive covariance: both variables increase or decrease together;
  • Negative covariance: one variable increases while the other decreases;
  • Magnitude depends on the scale of the variables, making comparison across datasets difficult.

Correlation

  • Standardizes the relationship by dividing the covariance by the product of the variables' standard deviations;
  • Results in a value between -1 and 1;
  • 1 means a perfect positive linear relationship;
  • -1 means a perfect negative linear relationship;
  • 0 means no linear relationship;
  • Unitless, so it is easier to interpret and compare.

Calculating Correlation and Covariance in Python

You can use libraries like numpy and pandas to calculate both covariance and correlation:

  • The cov() function in pandas computes the covariance matrix;
  • The corr() function gives you the correlation matrix.

These matrices show the pairwise relationships between all variables in your dataset.

Interpreting Results

  • A high correlation does not imply causation;
  • Outliers can impact both measures;
  • Correlation measures only linear relationships and may miss nonlinear associations.
123456789101112131415161718192021
import numpy as np import pandas as pd # Realistic dataset: Student statistics data = { "hours_studied": [2, 3, 5, 7, 9, 1, 4, 8, 6, 10], "exam_score": [55, 60, 75, 85, 95, 50, 70, 90, 80, 98], "social_media": [6, 5, 4, 2, 1, 7, 5, 2, 3, 0], "hours_slept": [7, 6, 8, 7, 6, 8, 7, 6, 8, 7] } df = pd.DataFrame(data) # Compute covariance matrix cov_matrix = df.cov() print("Covariance matrix:") print(cov_matrix) # Compute correlation matrix corr_matrix = df.corr() print("\nCorrelation matrix:") print(corr_matrix)
copy
question mark

Which of the following statements about correlation coefficients is correct?

Selecione a resposta correta

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 7

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Seção 1. Capítulo 7
some-alt