Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Shapiro Test | Normality Check
The Art of A/B Testing

bookShapiro Test

The Shapiro Test is a statistical test that is used to test the hypothesis of a normal distribution. It compares the distribution of the data with a normal distribution.

The null hypothesis assumes that the data are normally distributed. If the p-value is below the significance level (below 0.05), then the null hypothesis is rejected.

In such a case, we can argue that the data is not normally distributed (the alternative hypothesis is accepted).

Let's run the Shapiro Test for the first columns from the control and test groups at the same time:

1234567891011121314151617181920212223242526272829
# Import libraries import pandas as pd from scipy.stats import shapiro # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Do the Shapiro test for the control sample stat_control, p_control = shapiro(df_control['Impression']) print('Control group: ') print('Stat: %.4f, p-value: %.4f' % (stat_control, p_control)) # Define the distribution form if p_control > 0.05: print('Control group is likely to normal distribution') else: print('Control group is NOT likely to normal distribution') # Do the Shapiro test for the test sample stat_test, p_test = shapiro(df_test['Impression']) print('Test group: ') print('Stat: %.4f, p-value: %.4f' % (stat_test, p_test)) # Define the distribution form if p_test > 0.05: print('Control group is likely to normal distribution') else: print('Control group is NOT likely to normal distribution')
copy

Great! We got two results.

The higher the Statistic value, the more evidence is found in favor of a normal distribution. The p-value in both groups is high (greater than 0.05), which means we accept the null hypothesis.

Both columns are normally distributed.

Note

If we have more than 5 000 observations, it is better to use the Kolmogorov-Smirnov test. Its use is similar to the Shapiro test.

question mark

Can we be sure of a normal distribution by looking at the results of the Shapiro test?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 7

Spør AI

expand

Spør AI

ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

Suggested prompts:

Still meg spørsmål om dette emnet

Oppsummer dette kapittelet

Vis eksempler fra virkeligheten

Awesome!

Completion rate improved to 3.23

bookShapiro Test

Sveip for å vise menyen

The Shapiro Test is a statistical test that is used to test the hypothesis of a normal distribution. It compares the distribution of the data with a normal distribution.

The null hypothesis assumes that the data are normally distributed. If the p-value is below the significance level (below 0.05), then the null hypothesis is rejected.

In such a case, we can argue that the data is not normally distributed (the alternative hypothesis is accepted).

Let's run the Shapiro Test for the first columns from the control and test groups at the same time:

1234567891011121314151617181920212223242526272829
# Import libraries import pandas as pd from scipy.stats import shapiro # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Do the Shapiro test for the control sample stat_control, p_control = shapiro(df_control['Impression']) print('Control group: ') print('Stat: %.4f, p-value: %.4f' % (stat_control, p_control)) # Define the distribution form if p_control > 0.05: print('Control group is likely to normal distribution') else: print('Control group is NOT likely to normal distribution') # Do the Shapiro test for the test sample stat_test, p_test = shapiro(df_test['Impression']) print('Test group: ') print('Stat: %.4f, p-value: %.4f' % (stat_test, p_test)) # Define the distribution form if p_test > 0.05: print('Control group is likely to normal distribution') else: print('Control group is NOT likely to normal distribution')
copy

Great! We got two results.

The higher the Statistic value, the more evidence is found in favor of a normal distribution. The p-value in both groups is high (greater than 0.05), which means we accept the null hypothesis.

Both columns are normally distributed.

Note

If we have more than 5 000 observations, it is better to use the Kolmogorov-Smirnov test. Its use is similar to the Shapiro test.

question mark

Can we be sure of a normal distribution by looking at the results of the Shapiro test?

Select the correct answer

Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 7
some-alt