Descriptive Statistics

Before moving on to visualizing the distribution, it makes sense to look at the descriptive statistics of each parameter in the dataset.

Among the key parameters we need are the following:

Number of observations;
Average value;
Standard deviation;
Median;
Minimum value;
Maximum value.

Let's get on with it. We have the results of a controlled experiment for two groups of users.

Preliminary A/A testing showed that the experiment was adequate. Let's display our files:


              12345678
            
# Import pandas 
import pandas as pd

# Read .csv file 
df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';')

# Print head of the control dataframe
print(df_control.head())

In this table, we have 4 columns:

'Impression' - the number of views of the product page; 'Click' - the number of transitions to the product page; 'Purchase' - the number of product purchases; 'Earning' - profit from the sale of the product.


              12345678
            
# Import pandas
import pandas as pd

# Read .csv file
df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';')

# Print head of the test dataframe
print(df_test.head())

Now let's calculate the descriptive statistics and display them on the screen:


              12345678910111213141516
            
# Import pandas
import pandas as pd

# Read .csv files
df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';')
df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';')

# Calculate descriptive statistics using .agg method
control_descriptive = df_control['Impression'].agg(['count', 'mean', 'std', 'median', 'min', 'max']).round(2)
test_descriptive = df_test['Impression'].agg(['count', 'mean', 'std', 'median', 'min', 'max']).round(2)

# Concat the results of aggregations
result = pd.concat([control_descriptive, test_descriptive], axis=1)
result.columns = ['Control', 'Test']

print(result)

We use the .agg() method for the convenience of calculating descriptive statistics. This method is called aggregation. Aggregations are a way of collapsing, summarizing, or grouping data. Also, we use the .concat() method to conveniently display the aggregation results on the screen.

The averages seem pretty close. Or not?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

The Art of A/B Testing