Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Correlation Analysis: Pearson and Spearman | Bivariate and Correlation Analysis
Exploratory Data Analysis with Python

bookCorrelation Analysis: Pearson and Spearman

Understanding the relationships between numerical features is crucial for uncovering patterns in retail data.

Pearson vs. Spearman Correlation

  • Pearson correlation coefficient:

    • Measures the strength and direction of a linear relationship between two continuous variables;
    • Assumes that the data is normally distributed;
    • Most appropriate when the relationship is linear and variables are continuous.
  • Spearman rank correlation coefficient:

    • Assesses monotonic relationships, whether linear or not;
    • Compares the rank order of the data, not the raw values;
    • More robust to outliers and non-normality;
    • Ideal for ordinal data, non-linear but monotonic trends, or when Pearson's assumptions are violated.

Use Pearson when:

  • Both variables are continuous;
  • You expect a linear association.

Use Spearman when:

  • You have ordinal data;
  • The relationship is monotonic but not linear;
  • Your data contain outliers or violate Pearson's assumptions.
12345678910111213141516171819
import pandas as pd # Example retail dataset data = { "sales": [200, 220, 250, 270, 300, 320, 350, 370, 400, 420], "foot_traffic": [50, 55, 60, 65, 68, 70, 75, 80, 85, 90], "discount": [5, 10, 0, 0, 15, 20, 25, 10, 5, 0] } df = pd.DataFrame(data) # Compute Pearson correlation pearson_corr = df.corr(method="pearson") print("Pearson correlation matrix:") print(pearson_corr) # Compute Spearman correlation spearman_corr = df.corr(method="spearman") print("\nSpearman correlation matrix:") print(spearman_corr)
copy

When interpreting correlation results, pay attention to both the strength and the nature of the relationship:

  • A Pearson correlation value close to 1 or -1 signals a strong linear relationship;
  • Values near 0 suggest little or no linear association.

For example:

  • If sales and foot_traffic show a Pearson correlation of 0.98, this means that as foot traffic increases, sales tend to increase in a nearly linear way.

However:

  • If your data contains outliers or the relationship is not linear, the Pearson coefficient may be misleading.

In these cases, use the Spearman correlation, which captures monotonic trends:

  • Monotonic means one variable consistently increases or decreases as the other changes, regardless of the exact shape of the relationship;
  • For instance, if discount and sales have a Spearman correlation of -0.70, higher discounts generally correspond to lower sales ranks, even if the pattern is not strictly linear.

Always consider both the context and the type of relationship in your data when deciding which correlation metric to use.

question mark

Which scenario best calls for using the Spearman correlation coefficient rather than Pearson in retail analytics?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 4

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Awesome!

Completion rate improved to 5.56

bookCorrelation Analysis: Pearson and Spearman

Swipe um das Menü anzuzeigen

Understanding the relationships between numerical features is crucial for uncovering patterns in retail data.

Pearson vs. Spearman Correlation

  • Pearson correlation coefficient:

    • Measures the strength and direction of a linear relationship between two continuous variables;
    • Assumes that the data is normally distributed;
    • Most appropriate when the relationship is linear and variables are continuous.
  • Spearman rank correlation coefficient:

    • Assesses monotonic relationships, whether linear or not;
    • Compares the rank order of the data, not the raw values;
    • More robust to outliers and non-normality;
    • Ideal for ordinal data, non-linear but monotonic trends, or when Pearson's assumptions are violated.

Use Pearson when:

  • Both variables are continuous;
  • You expect a linear association.

Use Spearman when:

  • You have ordinal data;
  • The relationship is monotonic but not linear;
  • Your data contain outliers or violate Pearson's assumptions.
12345678910111213141516171819
import pandas as pd # Example retail dataset data = { "sales": [200, 220, 250, 270, 300, 320, 350, 370, 400, 420], "foot_traffic": [50, 55, 60, 65, 68, 70, 75, 80, 85, 90], "discount": [5, 10, 0, 0, 15, 20, 25, 10, 5, 0] } df = pd.DataFrame(data) # Compute Pearson correlation pearson_corr = df.corr(method="pearson") print("Pearson correlation matrix:") print(pearson_corr) # Compute Spearman correlation spearman_corr = df.corr(method="spearman") print("\nSpearman correlation matrix:") print(spearman_corr)
copy

When interpreting correlation results, pay attention to both the strength and the nature of the relationship:

  • A Pearson correlation value close to 1 or -1 signals a strong linear relationship;
  • Values near 0 suggest little or no linear association.

For example:

  • If sales and foot_traffic show a Pearson correlation of 0.98, this means that as foot traffic increases, sales tend to increase in a nearly linear way.

However:

  • If your data contains outliers or the relationship is not linear, the Pearson coefficient may be misleading.

In these cases, use the Spearman correlation, which captures monotonic trends:

  • Monotonic means one variable consistently increases or decreases as the other changes, regardless of the exact shape of the relationship;
  • For instance, if discount and sales have a Spearman correlation of -0.70, higher discounts generally correspond to lower sales ranks, even if the pattern is not strictly linear.

Always consider both the context and the type of relationship in your data when deciding which correlation metric to use.

question mark

Which scenario best calls for using the Spearman correlation coefficient rather than Pearson in retail analytics?

Select the correct answer

War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 3. Kapitel 4
some-alt