Course Content

Probability Theory Mastering

## Probability Theory Mastering

1. Additional Statements From The Probability Theory

3. Estimation of Population Parameters

4. Testing of Statistical Hypotheses

# Challenge: Resampling Approach to Compare Mean Values of the Datasets

We can also use the **resampling approach** to test the hypothesis with **non-Gaussian** datasets. Resampling is a technique for sampling from an available data set to generate additional samples, each of which is considered representative of the underlying population.

## Approach description

Let's describe the most simple resampling method to check **main hypothesis that two datasets X and Y have equal mean values**:

**Concatenate**both arrays (`X`

and`Y`

) into one big array;**Shuffle**that entire array so observations from each group are spread randomly throughout that array instead of being separated at the breaking point;- Arbitrarily
**split the array**in the breaking point (`X_length`

), assign observations below index`len(X_length)`

to Group A and the rest to Group B; **Subtract**the mean of this new Group A from the mean of the new Group B. This would give us**one permutation test statistic**;**Repeat**those steps`N`

times to simulate the main hypothesis distribution;- Calculate
**test statistics**on initial sets`X`

and`Y`

; - Determine
**critical values**of the main hypothesis distribution; - Check if the test statistic calculated on initial sets
**falls into a critical area**of the main hypothesis distribution. If it falls then reject the main hypothesis.

Let's apply this approach in code:

Task

Your task is to implement described above resampling algorithm and to check the corresponding hypothesis on two datasets:

- Use the
`np.concatenate()`

method to merge`X`

and`Y`

arrays. - Use the
`.shuffle()`

method of the`np.random`

module to shuffle data in the merged array. - Use
`np.quantile()`

method to calculate left critical value. - Use the created
`resampling_test()`

function to check the hypothesis on generated data.

Everything was clear?

# Challenge: Resampling Approach to Compare Mean Values of the Datasets

We can also use the **resampling approach** to test the hypothesis with **non-Gaussian** datasets. Resampling is a technique for sampling from an available data set to generate additional samples, each of which is considered representative of the underlying population.

## Approach description

Let's describe the most simple resampling method to check **main hypothesis that two datasets X and Y have equal mean values**:

**Concatenate**both arrays (`X`

and`Y`

) into one big array;**Shuffle**that entire array so observations from each group are spread randomly throughout that array instead of being separated at the breaking point;- Arbitrarily
**split the array**in the breaking point (`X_length`

), assign observations below index`len(X_length)`

to Group A and the rest to Group B; **Subtract**the mean of this new Group A from the mean of the new Group B. This would give us**one permutation test statistic**;**Repeat**those steps`N`

times to simulate the main hypothesis distribution;- Calculate
**test statistics**on initial sets`X`

and`Y`

; - Determine
**critical values**of the main hypothesis distribution; - Check if the test statistic calculated on initial sets
**falls into a critical area**of the main hypothesis distribution. If it falls then reject the main hypothesis.

Let's apply this approach in code:

Task

Your task is to implement described above resampling algorithm and to check the corresponding hypothesis on two datasets:

- Use the
`np.concatenate()`

method to merge`X`

and`Y`

arrays. - Use the
`.shuffle()`

method of the`np.random`

module to shuffle data in the merged array. - Use
`np.quantile()`

method to calculate left critical value. - Use the created
`resampling_test()`

function to check the hypothesis on generated data.

Everything was clear?