Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Descriptive Statistics | Normality Check
The Art of A/B Testing
course content

Course Content

The Art of A/B Testing

The Art of A/B Testing

1. What is A/B testing?
2. Normality Check
3. Variances in A/B Testing
4. T-Test
5. U-Test

bookDescriptive Statistics

Before moving on to visualizing the distribution, it makes sense to look at the descriptive statistics of each parameter in the dataset.

Among the key parameters we need are the following:

  • Number of observations;
  • Average value;
  • Standard deviation;
  • Median;
  • Minimum value;
  • Maximum value.

Let's get on with it. We have the results of a controlled experiment for two groups of users.

Preliminary A/A testing showed that the experiment was adequate. Let's display our files:

12345678
# Import pandas import pandas as pd # Read .csv file df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') # Print head of the control dataframe print(df_control.head())
copy

In this table, we have 4 columns:

'Impression' - the number of views of the product page; 'Click' - the number of transitions to the product page; 'Purchase' - the number of product purchases; 'Earning' - profit from the sale of the product.

12345678
# Import pandas import pandas as pd # Read .csv file df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Print head of the test dataframe print(df_test.head())
copy

Now let's calculate the descriptive statistics and display them on the screen:

12345678910111213141516
# Import pandas import pandas as pd # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Calculate descriptive statistics using .agg method control_descriptive = df_control['Impression'].agg(['count', 'mean', 'std', 'median', 'min', 'max']).round(2) test_descriptive = df_test['Impression'].agg(['count', 'mean', 'std', 'median', 'min', 'max']).round(2) # Concat the results of aggregations result = pd.concat([control_descriptive, test_descriptive], axis=1) result.columns = ['Control', 'Test'] print(result)
copy

We use the .agg() method for the convenience of calculating descriptive statistics. This method is called aggregation. Aggregations are a way of collapsing, summarizing, or grouping data. Also, we use the .concat() method to conveniently display the aggregation results on the screen.

The averages seem pretty close. Or not?

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 2
some-alt