Correlation Analysis
Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.
What Is Correlation?
A correlation coefficient (usually represented as r) ranges between -1 and 1 and means:
- 1: perfect positive correlation;
- 0: no correlation;
- β1: perfect negative correlation.
There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.
Correlation Between Two Variables
You can use the cor()
function to compute the correlation coefficient between two variables. All you need is to provide two columns as parameters.
cor(df$selling_price, df$km_driven)
As a result, the function returns a value between -1 and 1.
Correlation Matrix (Multiple Variables)
The same function can be used to examine relationships between multiple variables.
# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs") # Ignores any rows with missing data
The result is stored as a matrix that shows pairwise correlation values between all selected numeric variables.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain the difference between positive and negative correlation with more examples?
How do I interpret the values in a correlation matrix?
What should I do if my data contains non-numeric columns or missing values?
Awesome!
Completion rate improved to 4
Correlation Analysis
Swipe to show menu
Correlation analysis is a statistical technique used to measure the strength and direction of a relationship between two numeric variables. It helps us understand how changes in one variable are associated with changes in another.
What Is Correlation?
A correlation coefficient (usually represented as r) ranges between -1 and 1 and means:
- 1: perfect positive correlation;
- 0: no correlation;
- β1: perfect negative correlation.
There are several types of correlation methods, but Pearson correlation is the most commonly used for numeric continuous data in R.
Correlation Between Two Variables
You can use the cor()
function to compute the correlation coefficient between two variables. All you need is to provide two columns as parameters.
cor(df$selling_price, df$km_driven)
As a result, the function returns a value between -1 and 1.
Correlation Matrix (Multiple Variables)
The same function can be used to examine relationships between multiple variables.
# Select only numeric columns
numeric_df <- df[, c("selling_price", "km_driven", "max_power", "mileage", "engine", "seats")]
# Compute correlation matrix
cor_matrix <- cor(numeric_df, use = "complete.obs") # Ignores any rows with missing data
The result is stored as a matrix that shows pairwise correlation values between all selected numeric variables.
Thanks for your feedback!