  Course Content

Probability Theory Mastering

##   Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets

In the last chapter, we considered how to compare the mathematical expectations of two Gaussian datasets. But what if the datasets are not Gaussian, and is it possible to somehow compare them in this case?

We can use the CLT to compare mean values of non-Gaussian datasets:

1. If we have many samples, we can use the CLT to construct new features: instead of analyzing samples, we can analyze the mean values of the samples. Due to CLT, if we calculate the mean with many samples, this mean value will be normally distributed.
2. Use the Student criterion described in the previous chapter to test the hypothesis.

Note

For different distributions, you need to select a different number of samples for which the average is calculated to achieve normality. This is usually done experimentally using various tests for normality, for example, `shapiro` normality test.

1. Import `ttest_ind` function from `scipy.stats` module to provide t-test.
2. Use `.mean()` method to calculate the mean over the sliding window in `sliding_mean` function.
3. Use `shapiro()` function to check normality of `X_mean` array.
4. Specify condition in `if` statement to check hypothesis. 