Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Challenge: Resampling Approach to Compare Mean Values of the Datasets | Testing of Statistical Hypotheses
Advanced Probability Theory
course content

Зміст курсу

Advanced Probability Theory

Advanced Probability Theory

1. Additional Statements From The Probability Theory
2. The Limit Theorems of Probability Theory
3. Estimation of Population Parameters
4. Testing of Statistical Hypotheses

bookChallenge: Resampling Approach to Compare Mean Values of the Datasets

We can also use the resampling approach to test the hypothesis with non-Gaussian datasets. Resampling is a technique for sampling from an available data set to generate additional samples, each of which is considered representative of the underlying population.

Approach description

Let's describe the most simple resampling method to check main hypothesis that two datasets X and Y have equal mean values:

  • Concatenate both arrays (X and Y) into one big array;
  • Shuffle that entire array so observations from each group are spread randomly throughout that array instead of being separated at the breaking point;
  • Arbitrarily split the array in the breaking point (X_length), assign observations below index len(X_length) to Group A and the rest to Group B;
  • Subtract the mean of this new Group A from the mean of the new Group B. This would give us one permutation test statistic;
  • Repeat those steps N times to simulate the main hypothesis distribution;
  • Calculate test statistics on initial sets X and Y;
  • Determine critical values of the main hypothesis distribution;
  • Check if the test statistic calculated on initial sets falls into a critical area of the main hypothesis distribution. If it falls then reject the main hypothesis.

Let's apply this approach in code:

Завдання

Your task is to implement described above resampling algorithm and to check the corresponding hypothesis on two datasets:

  1. Use the np.concatenate() method to merge X and Y arrays.
  2. Use the .shuffle() method of the np.random module to shuffle data in the merged array.
  3. Use np.quantile() method to calculate left critical value.
  4. Use the created resampling_test() function to check the hypothesis on generated data.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 4. Розділ 5
toggle bottom row

bookChallenge: Resampling Approach to Compare Mean Values of the Datasets

We can also use the resampling approach to test the hypothesis with non-Gaussian datasets. Resampling is a technique for sampling from an available data set to generate additional samples, each of which is considered representative of the underlying population.

Approach description

Let's describe the most simple resampling method to check main hypothesis that two datasets X and Y have equal mean values:

  • Concatenate both arrays (X and Y) into one big array;
  • Shuffle that entire array so observations from each group are spread randomly throughout that array instead of being separated at the breaking point;
  • Arbitrarily split the array in the breaking point (X_length), assign observations below index len(X_length) to Group A and the rest to Group B;
  • Subtract the mean of this new Group A from the mean of the new Group B. This would give us one permutation test statistic;
  • Repeat those steps N times to simulate the main hypothesis distribution;
  • Calculate test statistics on initial sets X and Y;
  • Determine critical values of the main hypothesis distribution;
  • Check if the test statistic calculated on initial sets falls into a critical area of the main hypothesis distribution. If it falls then reject the main hypothesis.

Let's apply this approach in code:

Завдання

Your task is to implement described above resampling algorithm and to check the corresponding hypothesis on two datasets:

  1. Use the np.concatenate() method to merge X and Y arrays.
  2. Use the .shuffle() method of the np.random module to shuffle data in the merged array.
  3. Use np.quantile() method to calculate left critical value.
  4. Use the created resampling_test() function to check the hypothesis on generated data.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 4. Розділ 5
toggle bottom row

bookChallenge: Resampling Approach to Compare Mean Values of the Datasets

We can also use the resampling approach to test the hypothesis with non-Gaussian datasets. Resampling is a technique for sampling from an available data set to generate additional samples, each of which is considered representative of the underlying population.

Approach description

Let's describe the most simple resampling method to check main hypothesis that two datasets X and Y have equal mean values:

  • Concatenate both arrays (X and Y) into one big array;
  • Shuffle that entire array so observations from each group are spread randomly throughout that array instead of being separated at the breaking point;
  • Arbitrarily split the array in the breaking point (X_length), assign observations below index len(X_length) to Group A and the rest to Group B;
  • Subtract the mean of this new Group A from the mean of the new Group B. This would give us one permutation test statistic;
  • Repeat those steps N times to simulate the main hypothesis distribution;
  • Calculate test statistics on initial sets X and Y;
  • Determine critical values of the main hypothesis distribution;
  • Check if the test statistic calculated on initial sets falls into a critical area of the main hypothesis distribution. If it falls then reject the main hypothesis.

Let's apply this approach in code:

Завдання

Your task is to implement described above resampling algorithm and to check the corresponding hypothesis on two datasets:

  1. Use the np.concatenate() method to merge X and Y arrays.
  2. Use the .shuffle() method of the np.random module to shuffle data in the merged array.
  3. Use np.quantile() method to calculate left critical value.
  4. Use the created resampling_test() function to check the hypothesis on generated data.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

We can also use the resampling approach to test the hypothesis with non-Gaussian datasets. Resampling is a technique for sampling from an available data set to generate additional samples, each of which is considered representative of the underlying population.

Approach description

Let's describe the most simple resampling method to check main hypothesis that two datasets X and Y have equal mean values:

  • Concatenate both arrays (X and Y) into one big array;
  • Shuffle that entire array so observations from each group are spread randomly throughout that array instead of being separated at the breaking point;
  • Arbitrarily split the array in the breaking point (X_length), assign observations below index len(X_length) to Group A and the rest to Group B;
  • Subtract the mean of this new Group A from the mean of the new Group B. This would give us one permutation test statistic;
  • Repeat those steps N times to simulate the main hypothesis distribution;
  • Calculate test statistics on initial sets X and Y;
  • Determine critical values of the main hypothesis distribution;
  • Check if the test statistic calculated on initial sets falls into a critical area of the main hypothesis distribution. If it falls then reject the main hypothesis.

Let's apply this approach in code:

Завдання

Your task is to implement described above resampling algorithm and to check the corresponding hypothesis on two datasets:

  1. Use the np.concatenate() method to merge X and Y arrays.
  2. Use the .shuffle() method of the np.random module to shuffle data in the merged array.
  3. Use np.quantile() method to calculate left critical value.
  4. Use the created resampling_test() function to check the hypothesis on generated data.

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Секція 4. Розділ 5
Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
some-alt