  Course Content

Probability Theory Mastering

##   Comparing Means of Two Different Datasets

A rather important applied task is to compare the mathematical expectations of two different independent numerical datasets. In the general case, this task is solved rather non-trivially, but under certain conditions, this can be done relatively simply. Let's consider the following conditions: We have two independent numerical datasets with Gaussian distributions with equal variances(we may not know the real value of the variance, but we have to be sure that variances are equal). We want to test the following hypothesis:
Main hypothesis: expectations of these datasets are equal.

Alternative hypothesis: the expectation of the X dataset is greater than that of the Y dataset.

If the conditions described above are met, then we can use the following criterion to check this hypothesis: Let's generate two independent datasets with different mean values and try to check the hypothesis:  We see that the value of the criterion fell into the right critical region, so we conclude that the mathematical expectation of the first dataset is greater than the mathematical expectation of the second.

There is also a generalization of this criterion in case the variances of the dataset are different, let's look at an example of how this can be implemented in code:  In the code above, we used `equal_var=False` as an argument of `stats.ttest_ind` method to provide hypothesis testing for datasets with different variances.