Course Content

# Probability Theory Mastering

1. Additional Statements From The Probability Theory

3. Estimation of Population Parameters

4. Testing of Statistical Hypotheses

Probability Theory Mastering

## Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets

In the last chapter, we considered how to compare the mathematical expectations of **two Gaussian datasets**. But what if the datasets are not Gaussian, and is it possible to somehow compare them in this case?

## Using Central Limit Theorem to compare mean values

We can use the CLT to compare mean values of non-Gaussian datasets:

- If we have many samples, we can use the CLT to construct
**new features**: instead of analyzing samples, we can analyze the**mean values of the samples**. Due to CLT, if we calculate the mean with many samples, this mean value will be normally distributed; - Use the
**Student criterion**described in the previous chapter to test the hypothesis.

Note

For different distributions, you need to select a

different number of samplesfor which the average is calculated to achieve normality. This is usually done experimentally using various tests for normality, for example,`shapiro`

normality test.

# Task

Now we will check the hypothesis that two exponential datasets have equal mean values using the Central Limit Theorem. Your task is:

- Import
`ttest_ind`

function from`scipy.stats`

module to provide t-test. - Use
`.mean()`

method to calculate the mean over the sliding window in`sliding_mean`

function. - Use
`shapiro()`

function to check normality of`X_mean`

array. - Specify condition in
`if`

statement to check hypothesis.

Everything was clear?

Course Content

# Probability Theory Mastering

1. Additional Statements From The Probability Theory

3. Estimation of Population Parameters

4. Testing of Statistical Hypotheses

Probability Theory Mastering

## Challenge: Using CLT to Compare Mean Values of Non-Gaussian Datasets

In the last chapter, we considered how to compare the mathematical expectations of **two Gaussian datasets**. But what if the datasets are not Gaussian, and is it possible to somehow compare them in this case?

## Using Central Limit Theorem to compare mean values

We can use the CLT to compare mean values of non-Gaussian datasets:

- If we have many samples, we can use the CLT to construct
**new features**: instead of analyzing samples, we can analyze the**mean values of the samples**. Due to CLT, if we calculate the mean with many samples, this mean value will be normally distributed; - Use the
**Student criterion**described in the previous chapter to test the hypothesis.

Note

For different distributions, you need to select a

different number of samplesfor which the average is calculated to achieve normality. This is usually done experimentally using various tests for normality, for example,`shapiro`

normality test.

# Task

Now we will check the hypothesis that two exponential datasets have equal mean values using the Central Limit Theorem. Your task is:

- Import
`ttest_ind`

function from`scipy.stats`

module to provide t-test. - Use
`.mean()`

method to calculate the mean over the sliding window in`sliding_mean`

function. - Use
`shapiro()`

function to check normality of`X_mean`

array. - Specify condition in
`if`

statement to check hypothesis.

Everything was clear?