Find the Correlation
Finally, let's move to the last function of this section called .corr()
. It helps out a lot to find the relationship between numerical data. Imagine that you have a dataset on houses:
Price USD | Number of Rooms | Distance from the City Center in km |
---|---|---|
329000 | 4 | 25 |
8739000 | 6 | 3 |
1268000 | 6 | 2 |
987000 | 4 | 10 |
103000 | 2 | 30 |
Let's examine the output of the data.corr()
in our case:
Price USD | Number of Rooms | Distance from the City Center in km | |
---|---|---|---|
Price USD | 1.000000 | 0.625651 | -0.589396 |
Number of Rooms | 0.625651 | 1.000000 | -0.908600 |
Distance from the City Center in km | -0.589396 | -0.908600 | 1.000000 |
So, let's do it step by step: You have vertical and horizontal values; each pair overlaps. In each overlap, we can receive a value from -1 to 1.
- 1 means that two values depend on each other in a directly proportional way (if one value increases, the other increases too);
- -1 means that two values depend on each other in an inversely proportional way (if one value increases, the other decreases);
- 0 means that the two dependent values aren't proportional.
Tarea
You'll end this section with an effortless task: apply the .corr()
function to the dataset. Then, try to analyze the numbers you get.
¿Todo estuvo claro?
Contenido del Curso
Advanced Techniques in pandas
1. Get Familiar With Indexing and Selecting Data
Advanced Techniques in pandas
Find the Correlation
Finally, let's move to the last function of this section called .corr()
. It helps out a lot to find the relationship between numerical data. Imagine that you have a dataset on houses:
Price USD | Number of Rooms | Distance from the City Center in km |
---|---|---|
329000 | 4 | 25 |
8739000 | 6 | 3 |
1268000 | 6 | 2 |
987000 | 4 | 10 |
103000 | 2 | 30 |
Let's examine the output of the data.corr()
in our case:
Price USD | Number of Rooms | Distance from the City Center in km | |
---|---|---|---|
Price USD | 1.000000 | 0.625651 | -0.589396 |
Number of Rooms | 0.625651 | 1.000000 | -0.908600 |
Distance from the City Center in km | -0.589396 | -0.908600 | 1.000000 |
So, let's do it step by step: You have vertical and horizontal values; each pair overlaps. In each overlap, we can receive a value from -1 to 1.
- 1 means that two values depend on each other in a directly proportional way (if one value increases, the other increases too);
- -1 means that two values depend on each other in an inversely proportional way (if one value increases, the other decreases);
- 0 means that the two dependent values aren't proportional.
Tarea
You'll end this section with an effortless task: apply the .corr()
function to the dataset. Then, try to analyze the numbers you get.
¿Todo estuvo claro?