Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Correlation Matrix | Correlation
Explore the Linear Regression Using Python

book
Correlation Matrix

Let’s go back to our dataset. To explore the relationships between all the columns, we can use a correlation matrix. It finds pairwise correlation coefficients of all columns(that's why the matrix is symmetric). Use the following method to build it and show correlation coefficients between all variables: dataframe.corr().

Use this code to see the matrix for our wine dataset:

matrix= data.corr().round(2)
print(matrix)
12
matrix= data.corr().round(2) print(matrix)
copy

If we want to visualize this matrix just call function sns.heatmap and import library:

import seaborn as sns
sns.heatmap(matrix, annot=True)
12
import seaborn as sns sns.heatmap(matrix, annot=True)
copy

If you want to improve your knowledge in Seaborn Visualization, click here!

We can see that alcohol is most positively correlated with the proline (0.64), which means that the amount of alcohol increases as the proline. The hue is most negatively correlated with the color intensity (-0.52), which means that the greater the color intensity of the wine, the lower the hue.

Tarefa

Swipe to start coding

In the future, we will try to predict the characteristics of wine by the number of flavanoids in it. Flavanoids are plant pigments, and their most prominent role is to color our crops brightly.

  1. [Lines #3-4] Import the pandas, seaborn libraries.
  2. [Line #17] Write the code to define the correlation matrix rounding it to the second digit.
  3. [Lines #20-24] Find with which column flavanoids have the highest positive correlation and the negative correlation. Using the previous diagram we can obviously find that that's total_phenols (0.86) and nonflavanoid_phenols(-0.54) respectively. Assign numbers above to the variables positive_cor_value and negative_cor_value respectively (positive_cor_value = 0.86 and negative_cor_value = -0.54). Assign names and numbers to the corresponding variables.

Solução

# Import the libraries
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from sklearn.datasets import load_wine

# Load the dataset
wine = load_wine()

# Configure pandas to show all features
pd.set_option('display.max_rows', None, 'display.max_columns', None)

# Convert the data to a dataframe to view properly
data = pd.DataFrame(data = wine['data'], columns = wine['feature_names'])

# Defining the matrix
matrix = data.corr().round(2)

# Define results
positive_correlation = 'total_phenols'
positive_cor_value = 0.86

negative_correlation = 'nonflavanoid_phenols'
negative_cor_value = -0.54

# Print results
print('the greatest positive correlation coefficient with ', positive_correlation)
print('the value of correlation is = ', positive_cor_value)

print('the greatest negative correlation coefficient with ', negative_correlation)
print('the value of correlation is = ', negative_cor_value )

# Set scale and visuale data
sns.set(rc = {'figure.figsize':(20,10)})
sns.heatmap(matrix, annot=True)
plt.show()

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 2
# Import the libraries
import matplotlib.pyplot as plt
import pandas as ___
import ___ as sns
from sklearn.datasets import load_wine

# Load the dataset
wine = load_wine()

# Configure pandas to show all features
pd.set_option('display.max_rows', None, 'display.max_columns', None)

# Convert the data to a dataframe to view properly
data = pd.DataFrame(data = wine['data'], columns = wine['feature_names'])

# Defining the matrix
matrix = ___

# Define results
positive_correlation = ___
positive_cor_value = ___

negative_correlation = ___
negative_cor_value = ___

# Print results
print('the greatest positive correlation coefficient with ', positive_correlation)
print('the value of correlation is = ', positive_cor_value)

print('the greatest negative correlation coefficient with ', negative_correlation)
print('the value of correlation is = ', negative_cor_value )

# Set scale and visuale data
sns.set(rc = {'figure.figsize':(20,10)})
sns.heatmap(matrix, annot=True)
plt.show()

Pergunte à IA

expand
ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

some-alt