Course Content
Learning Statistics with Python
Learning Statistics with Python
Calculate Variance with Python
Calculating Variance with NumPy
In NumPy, you need to input the sequence of values (in our case, the column of the dataset) into the np.var()
function, like this: np.var(df['work_year'])
.
Calculating Variance with pandas
In pandas, you should use the .var()
method on the sequence of values (in our case, the column of the dataset), like this: df['work_year'].var()
.
In both cases, the results are almost the same. The differences are due to different denominators: N in NumPy, and N-1 in pandas. Check it now!
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/ds_salaries_statistics', index_col = 0) # Calculate the variance using the function from the NumPy library var_1 = np.var(df['salary_in_usd']) # Calculate the variance using the function from the pandas library var_2 = df['salary_in_usd'].var() print('The variace using NumPy library is', var_1) print('The variace using pandas library is', var_2)
Everything was clear?