Зміст курсу
The Art of A/B Testing
The Art of A/B Testing
Descriptive Statistics
Before moving on to visualizing the distribution, it makes sense to look at the descriptive statistics of each parameter in the dataset.
Among the key parameters we need are the following:
- Number of observations;
- Average value;
- Standard deviation;
- Median;
- Minimum value;
- Maximum value.
Let's get on with it. We have the results of a controlled experiment for two groups of users.
Preliminary A/A testing showed that the experiment was adequate. Let's display our files:
# Import pandas import pandas as pd # Read .csv file df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') # Print head of the control dataframe print(df_control.head())
In this table, we have 4 columns:
'Impression'
- the number of views of the product page;
'Click'
- the number of transitions to the product page;
'Purchase'
- the number of product purchases;
'Earning'
- profit from the sale of the product.
# Import pandas import pandas as pd # Read .csv file df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Print head of the test dataframe print(df_test.head())
Now let's calculate the descriptive statistics and display them on the screen:
# Import pandas import pandas as pd # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Calculate descriptive statistics using .agg method control_descriptive = df_control['Impression'].agg(['count', 'mean', 'std', 'median', 'min', 'max']).round(2) test_descriptive = df_test['Impression'].agg(['count', 'mean', 'std', 'median', 'min', 'max']).round(2) # Concat the results of aggregations result = pd.concat([control_descriptive, test_descriptive], axis=1) result.columns = ['Control', 'Test'] print(result)
We use the .agg()
method for the convenience of calculating descriptive statistics. This method is called aggregation. Aggregations are a way of collapsing, summarizing, or grouping data.
Also, we use the .concat()
method to conveniently display the aggregation results on the screen.
The averages seem pretty close. Or not?
Дякуємо за ваш відгук!