Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Correlation Analysis | Basic Statistical Analysis
Data Analysis with R

bookCorrelation Analysis

Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.

What Is Correlation?

A correlation coefficient (usually represented as rr) ranges between -1 and 1 and means:

  • 1: perfect positive correlation;
  • 0: no correlation;
  • βˆ’1: perfect negative correlation.

There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.

Correlation Between Two Variables

You can use the cor() function to compute the correlation coefficient between two variables. All you need is to provide two columns as parameters.

cor(df$selling_price, df$km_driven)

As a result, the function returns a value between -1 and 1.

Correlation Matrix (Multiple Variables)

The same function can be used to examine relationships between multiple variables.

# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs")  # Ignores any rows with missing data

The result is stored as a matrix that shows pairwise correlation values between all selected numeric variables.

question mark

A correlation coefficient of -0.9 indicates:

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 5

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain the difference between positive and negative correlation with more examples?

How do I interpret the values in a correlation matrix?

What should I do if my data contains non-numeric columns or missing values?

Awesome!

Completion rate improved to 4

bookCorrelation Analysis

Swipe to show menu

Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.

What Is Correlation?

A correlation coefficient (usually represented as rr) ranges between -1 and 1 and means:

  • 1: perfect positive correlation;
  • 0: no correlation;
  • βˆ’1: perfect negative correlation.

There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.

Correlation Between Two Variables

You can use the cor() function to compute the correlation coefficient between two variables. All you need is to provide two columns as parameters.

cor(df$selling_price, df$km_driven)

As a result, the function returns a value between -1 and 1.

Correlation Matrix (Multiple Variables)

The same function can be used to examine relationships between multiple variables.

# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs")  # Ignores any rows with missing data

The result is stored as a matrix that shows pairwise correlation values between all selected numeric variables.

question mark

A correlation coefficient of -0.9 indicates:

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 5
some-alt