Зміст курсу
The Art of A/B Testing
The Art of A/B Testing
The Third U-test
Let's compare the 'Average Purchase Value'
metric for both samples.
We'll start with the distribution plot:
It's difficult to make a conclusive inference about the distribution of both samples. Additionally, the median of the test group appears to be larger. Let's conduct the Shapiro test:
# Import libraries import pandas as pd from scipy.stats import shapiro # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define the metric df_test['Average Purchase Value'] = df_test['Earning'] / df_test['Purchase'] df_control['Average Purchase Value'] = df_control['Earning'] / df_control['Purchase'] # Testing normality of the control dataframe metric stat_control, p_control = shapiro(df_control['Average Purchase Value']) # Print the results of normality testing print("Control group: ") print("Stat: %.4f, p-value: %.4f" % (stat_control, p_control)) # Identify normality if p_control > 0.05: print('Control group is likely to normal distribution') else: print('Control group is NOT likely to normal distribution') # Testing normality of the test dataframe metric stat_control, p_control = shapiro(df_test['Average Purchase Value']) # Result of normality testing print("Test group: ") print("Stat: %.4f, p-value: %.4f" % (stat_control, p_control)) # Identify normality if p_control > 0.05: print('Test group is likely to normal distribution') else: print('Test group is NOT likely to normal distribution')
The first Shapiro test did not find statistical evidence of normality in the distribution. However, the second Shapiro test confirmed the normality of the distribution in the test group. In this case, for the U-test, it is not a problem. It is capable of comparing both normal and non-normal distributions. We do not need to be concerned about the variance in this case.
The hypotheses will be:
H₀: The medians of the 'Average Purchase Value' metric in the control and test groups are the same.
Hₐ: The medians of the 'Average Purchase Value' metric differ between the control and test groups.
# Import libraries import pandas as pd from scipy.stats import mannwhitneyu # Read .csv df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define the metric df_test['Average Purchase Value'] = df_test['Earning'] / df_test['Purchase'] df_control['Average Purchase Value'] = df_control['Earning'] / df_control['Purchase'] # Do the U-Test statistic, p_value = mannwhitneyu(df_control['Average Purchase Value'], df_test['Average Purchase Value']) # Result of the U-Test print('Statistic:', statistic) print('p-value:', p_value) # Idenify the equals of the medians if p_value > 0.05: print('The medians of the two groups are NOT statistically different') else: print('The medians of the two groups are statistically different')
The obtained test statistic value and the low p-value indicate a statistically significant difference between the medians of the 'Average Purchase Value'
metric in the control and test groups. We have sufficient evidence to reject the null hypothesis and accept the alternative hypothesis that the medians are different. The median in the test group is larger.
Дякуємо за ваш відгук!