Learning Statistics with Python

## Learning Statistics with Python

# Calculate Variance with Python

**Calculating Variance with NumPy**

In **NumPy**, you need to input the sequence of values (in our case, the column of the dataset) into the `np.var()`

function, like this: `np.var(df['work_year'])`

.

**Calculating Variance with pandas**

In **pandas**, you should use the `.var()`

method on the sequence of values (in our case, the column of the dataset), like this: `df['work_year'].var()`

.

In both cases, the results are almost the same. The differences are due to different denominators: N in NumPy, and N-1 in pandas. Check it now!

`import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/ds_salaries_statistics', index_col = 0) # Calculate the variance using the function from the NumPy library var_1 = np.var(df['salary_in_usd']) # Calculate the variance using the function from the pandas library var_2 = df['salary_in_usd'].var() print('The variace using NumPy library is', var_1) print('The variace using pandas library is', var_2)`

Everything was clear?