Calculating the Pearson Coefficient Using NumPy and Pandas
Let's look at how we can calculate the correlation coefficient if our data's type is np.array
. The library has many statistics routines which simplify the calculations. We will use the method np.corrcoef()
. It works with 2 arrays of the same length of our data:
# Import the libraries import numpy as np # Define np.arrays x = np.array([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = np.array([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Find correlation r = np.corrcoef(x, y)
python
This function returns the correlation matrix (2-dimensional array) of correlation coefficients. Here is a more convenient version of the array:
The upper right value corresponds to the correlation coefficient for y and x, while the lower-left value is the correlation coefficient for x and y. These values we will always need. The other ones are the correlation coefficients between x and x, y and y. They are always equal to one.
If you want just the Pearson coefficient between x and y use this:
print(np.corrcoef(x, y)[0,1])
Pandas correlation calculations also has a function to calculate the correlation coefficient for two of the same length Series objects. You can use .corr()
method:
# Import the libraries import pandas as pd # Define series x = pd.Series([1, 2, 3, 5, 7, 8, 10, 11, 13, 15]) y = pd.Series([2, 4, 7, 8, 10, 15, 20, 21, 23, 30]) # Print correlation coeffitients print(x.corr(y)) print(y.corr(x))
python
Swipe to start coding
You have the initial dataset of Abyssinian cats' weight and height (x
and y
arrays, respectively). Find the correlation coefficient between x
and y
using all functions we discussed in this chapter.
- [Lines #2-3] Import the
pandas
,numpy
libraries. - [Lines #10-11] Change the type of arrays to np.arrays, find the correlation coefficient between
x
andy
. - [Line #12] Print the correlation coefficient you have found in a such way.
- [Lines #15-16] Change the type of arrays to Pandas Series , find the correlation coefficient between
x
andy
. - [Line #17] Print the correlation coefficient you have found in a such way.
Ratkaisu
Kiitos palautteestasi!