Calculate Variance with Python
Calculating Variance with NumPy
In numpy
, pass the sequence of values (such as a column from the dataset) into the np.var()
function, for example: np.var(df['work_year'])
.
Calculating Variance with pandas
In pandas
, apply the .var()
method directly to the column, like this: df['work_year'].var()
.
Both methods produce similar results, with slight differences due to the use of different denominators: N in numpy
(population variance) and N-1 in pandas
(sample variance).
123456789101112import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/ds_salaries_statistics', index_col = 0) # Calculate the variance using the function from the NumPy library var_1 = np.var(df['salary_in_usd']) # Calculate the variance using the function from the pandas library var_2 = df['salary_in_usd'].var() print('The variace using NumPy library is', var_1) print('The variace using pandas library is', var_2)
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 2.63
Calculate Variance with Python
Swipe to show menu
Calculating Variance with NumPy
In numpy
, pass the sequence of values (such as a column from the dataset) into the np.var()
function, for example: np.var(df['work_year'])
.
Calculating Variance with pandas
In pandas
, apply the .var()
method directly to the column, like this: df['work_year'].var()
.
Both methods produce similar results, with slight differences due to the use of different denominators: N in numpy
(population variance) and N-1 in pandas
(sample variance).
123456789101112import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/ds_salaries_statistics', index_col = 0) # Calculate the variance using the function from the NumPy library var_1 = np.var(df['salary_in_usd']) # Calculate the variance using the function from the pandas library var_2 = df['salary_in_usd'].var() print('The variace using NumPy library is', var_1) print('The variace using pandas library is', var_2)
Thanks for your feedback!