Comparing Means of Two Different Datasets | Testing of Statistical Hypotheses
Probability Theory Mastering

Course Content

Probability Theory Mastering

## Probability Theory Mastering

1. Additional Statements From The Probability Theory
2. The Limit Theorems of Probability Theory
3. Estimation of Population Parameters
4. Testing of Statistical Hypotheses

# Comparing Means of Two Different Datasets

A rather important applied task is to compare the mathematical expectations of two different independent numerical datasets.
In the general case, this task is solved rather non-trivially, but under certain conditions, this can be done relatively simply. Let's consider the following conditions:

We have two independent numerical datasets with Gaussian distributions with equal variances(we may not know the real value of the variance, but we have to be sure that variances are equal). We want to test the following hypothesis:

Main hypothesis: expectations of these datasets are equal.

Alternative hypothesis: the expectation of the X dataset is greater than that of the Y dataset.

## Statistical criterion

If the conditions described above are met, then we can use the following criterion to check this hypothesis:

## Python implementation

Let's generate two independent datasets with different mean values and try to check the hypothesis:

We see that the value of the criterion fell into the right critical region, so we conclude that the mathematical expectation of the first dataset is greater than the mathematical expectation of the second.

### Datasets with different variances

There is also a generalization of this criterion in case the variances of the dataset are different, let's look at an example of how this can be implemented in code:

In the code above, we used `equal_var=False` as an argument of `stats.ttest_ind` method to provide hypothesis testing for datasets with different variances.

Can we use Student's t-tests with non-Gaussian data?