Course Content

Probability Theory Mastering

## Probability Theory Mastering

# Comparing Means of Two Different Datasets

A rather important applied task is to **compare the mathematical expectations** of two different independent numerical datasets.

In the general case, this task is solved rather non-trivially, but under certain conditions, this can be done relatively simply.
Let's consider the following conditions:

We have two independent numerical datasets with Gaussian distributions with equal variances(we may not know the real value of the variance, but we have to be sure that variances are equal). We want to test the following **hypothesis**:

**Main hypothesis**: expectations of these datasets are equal.

**Alternative hypothesis**: the expectation of the X dataset is greater than that of the Y dataset.

## Statistical criterion

If the conditions described above are met, then we can use the following **criterion** to check this hypothesis:

## Python implementation

Let's generate two independent datasets with different mean values and try to check the hypothesis:

We see that the value of the criterion fell into the **right critical region**, so we conclude that the mathematical expectation of the first dataset is **greater than** the mathematical expectation of the second.

### Datasets with different variances

There is also a generalization of this criterion in case **the variances of the dataset are different**, let's look at an example of how this can be implemented in code:

In the code above, we used `equal_var=False`

as an argument of `stats.ttest_ind`

method to provide hypothesis testing for datasets with different variances.

Everything was clear?