Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Challenge: Resampling Approach to Compare Mean Values of the Datasets | Testing of Statistical Hypotheses
Probability Theory Mastering

Course Content

Probability Theory Mastering

# Challenge: Resampling Approach to Compare Mean Values of the Datasets

We can also use the resampling approach to test the hypothesis with non-Gaussian datasets. Resampling is a technique for sampling from an available data set to generate additional samples, each of which is considered representative of the underlying population.

## Approach description

Let's describe the most simple resampling method to check main hypothesis that two datasets X and Y have equal mean values:

• Concatenate both arrays (X and Y) into one big array;
• Shuffle that entire array so observations from each group are spread randomly throughout that array instead of being separated at the breaking point;
• Arbitrarily split the array in the breaking point (X_length), assign observations below index len(X_length) to Group A and the rest to Group B;
• Subtract the mean of this new Group A from the mean of the new Group B. This would give us one permutation test statistic;
• Repeat those steps N times to simulate the main hypothesis distribution;
• Calculate test statistics on initial sets X and Y;
• Determine critical values of the main hypothesis distribution;
• Check if the test statistic calculated on initial sets falls into a critical area of the main hypothesis distribution. If it falls then reject the main hypothesis.

Let's apply this approach in code:

Your task is to implement described above resampling algorithm and to check the corresponding hypothesis on two datasets:

1. Use the np.concatenate() method to merge X and Y arrays.
2. Use the .shuffle() method of the np.random module to shuffle data in the merged array.
3. Use np.quantile() method to calculate left critical value.
4. Use the created resampling_test() function to check the hypothesis on generated data.

Everything was clear?

Section 4. Chapter 5

# Challenge: Resampling Approach to Compare Mean Values of the Datasets

We can also use the resampling approach to test the hypothesis with non-Gaussian datasets. Resampling is a technique for sampling from an available data set to generate additional samples, each of which is considered representative of the underlying population.

## Approach description

Let's describe the most simple resampling method to check main hypothesis that two datasets X and Y have equal mean values:

• Concatenate both arrays (X and Y) into one big array;
• Shuffle that entire array so observations from each group are spread randomly throughout that array instead of being separated at the breaking point;
• Arbitrarily split the array in the breaking point (X_length), assign observations below index len(X_length) to Group A and the rest to Group B;
• Subtract the mean of this new Group A from the mean of the new Group B. This would give us one permutation test statistic;
• Repeat those steps N times to simulate the main hypothesis distribution;
• Calculate test statistics on initial sets X and Y;
• Determine critical values of the main hypothesis distribution;
• Check if the test statistic calculated on initial sets falls into a critical area of the main hypothesis distribution. If it falls then reject the main hypothesis.

Let's apply this approach in code: