Зміст курсу
The Art of A/B Testing
The Art of A/B Testing
A/A Test
Preparation for the Experiment
Before conducting a controlled experiment, we must be convinced that the data in the test group have been collected correctly. We must consider several factors:
- Day of the week effect. Groups may differ on weekends and weekdays. People behave differently on different days of the week. Therefore, the data will be collected within a week;
- Seasonality. During the holidays, users shop more actively, which can give false ideas about real sales. Therefore, the data are collected in the season without holidays;
- Growing number of users over time. More and more people are involved in the experiment over time. So, we conducted an online experiment for three groups of users. Each group was tested for a full week. An equal number of users took part in each experiment. Our experiment took place in the off-season (there were no holidays that would have provoked an increase in sales).
The metric of the success of the experiment is the conversion rate. It is time to check the adequacy of our results.
Let's get acquainted with the data. Both datasets have one hundred records and three columns. The first column 'Male'
is binary. If the value is equal to 1 - the user is male. If the value is equal to 0 - the user is female.
The second column 'Page View'
characterizes the number of page views. The third column 'Purchase'
corresponds to the number of purchases.
Let's see what these tables look like:
# Import libraries import pandas as pd from scipy.stats import mannwhitneyu # Read .csv files control_group_1 = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/updated_first.csv') control_group_2 = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/updated_second.csv') # Show head of the first file print(control_group_1)
The second file is similar to the first file. But can we say that there is no statistically significant difference between them?
We must make sure that no factors influenced our experiment. In other words, the average metric values of control group 1 and control group 2 must be the same.
Let's formulate hypotheses:
H₀: There is no statistically significant difference between the means of the two samples.
Hₐ: There is a statistically significant difference between the means of the two samples.
Our first test:
# Import libraries import pandas as pd from scipy.stats import mannwhitneyu # Read .csv files control_group_1 = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/updated_first.csv') control_group_2 = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/updated_second.csv') # Define metric control_group_1['Conversion'] = (control_group_1['Purchase'] / control_group_1['Page view']).round(2) control_group_2['Conversion'] = (control_group_2['Purchase'] / control_group_2['Page view']).round(2) # Do U-Test stat, p = mannwhitneyu(control_group_1['Conversion'], control_group_2['Conversion']) # Identify the test result print('stat=%.3f, p=%.3f' % (stat, p)) if p > 0.05: print('There is no statistically significant difference between the medians of the two samples') else: print('There is a statistically significant difference between the medians of the two samples')
Since p > 0.05, we cannot reject the null hypothesis that the two means are equal.
Looks easy, right?
In this code, we created two groups of data, control_group_1
and control_group_2
, performed a u-test using the mannwhitney
function, and displayed the test results on the screen.
Why was this particular test chosen? We'll talk about this in the next chapters.
Дякуємо за ваш відгук!