Cursusinhoud
Learning Statistics with Python
Learning Statistics with Python
2. Mean, Median and Mode with Python
4. Covariance vs Correlation
Calculate Variance with Python
Calculating Variance with NumPy
In numpy
, pass the sequence of values (such as a column from the dataset) into the np.var()
function, for example: np.var(df['work_year'])
.
Calculating Variance with pandas
In pandas
, apply the .var()
method directly to the column, like this: df['work_year'].var()
.
Both methods produce similar results, with slight differences due to the use of different denominators: N in numpy
(population variance) and N-1 in pandas
(sample variance).
import pandas as pd import numpy as np df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a849660e-ddfa-4033-80a6-94a1b7772e23/update/ds_salaries_statistics', index_col = 0) # Calculate the variance using the function from the NumPy library var_1 = np.var(df['salary_in_usd']) # Calculate the variance using the function from the pandas library var_2 = df['salary_in_usd'].var() print('The variace using NumPy library is', var_1) print('The variace using pandas library is', var_2)
Was alles duidelijk?
Bedankt voor je feedback!
Sectie 3. Hoofdstuk 3