Learn Calculate Variance with Python | Variance and Standard Deviation

Calculating Variance with NumPy

In numpy, pass the sequence of values (such as a column from the dataset) into the np.var() function, for example: np.var(df['work_year']).

Calculating Variance with pandas

In pandas, apply the .var() method directly to the column, like this: df['work_year'].var().

Both methods produce similar results, with slight differences due to the use of different denominators: N in numpy (population variance) and N-1 in pandas (sample variance).


              123456789101112
            
import pandas as pd
import numpy as np

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/ds_salaries_statistics', index_col = 0)

# Calculate the variance using the function from the NumPy library
var_1 = np.var(df['salary_in_usd'])
# Calculate the variance using the function from the pandas library
var_2 = df['salary_in_usd'].var()

print('The variace using NumPy library is', var_1)
print('The variace using pandas library is', var_2)

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 3

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Swipe to show menu

Calculating Variance with NumPy

In numpy, pass the sequence of values (such as a column from the dataset) into the np.var() function, for example: np.var(df['work_year']).

Calculating Variance with pandas

In pandas, apply the .var() method directly to the column, like this: df['work_year'].var().

Both methods produce similar results, with slight differences due to the use of different denominators: N in numpy (population variance) and N-1 in pandas (sample variance).


              123456789101112
            
import pandas as pd
import numpy as np

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/ds_salaries_statistics', index_col = 0)

# Calculate the variance using the function from the NumPy library
var_1 = np.var(df['salary_in_usd'])
# Calculate the variance using the function from the pandas library
var_2 = df['salary_in_usd'].var()

print('The variace using NumPy library is', var_1)
print('The variace using pandas library is', var_2)

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 3

Calculate Variance with Python

Calculating Variance with NumPy

Calculating Variance with pandas

Awesome!

Calculate Variance with Python

Calculating Variance with NumPy

Calculating Variance with pandas