Learn The Third U-test

Let's compare the 'Average Purchase Value' metric for both samples.

We'll start with the distribution plot:

It's difficult to make a conclusive inference about the distribution of both samples. Additionally, the median of the test group appears to be larger. Let's conduct the Shapiro test:


              12345678910111213141516171819202122232425262728293031323334353637
            
# Import libraries
import pandas as pd
from scipy.stats import shapiro

# Read .csv files 
df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';')
df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';')

# Define the metric
df_test['Average Purchase Value'] = df_test['Earning'] / df_test['Purchase']
df_control['Average Purchase Value'] = df_control['Earning'] / df_control['Purchase']

# Testing normality of the control dataframe metric
stat_control, p_control = shapiro(df_control['Average Purchase Value'])

# Print the results of normality testing
print("Control group: ")
print("Stat: %.4f, p-value: %.4f" % (stat_control, p_control))

# Identify normality 
if p_control > 0.05:
  print('Control group is likely to normal distribution')
else:
  print('Control group is NOT likely to normal distribution')
  
# Testing normality of the test dataframe metric
stat_control, p_control = shapiro(df_test['Average Purchase Value'])

# Result of normality testing
print("Test group: ")
print("Stat: %.4f, p-value: %.4f" % (stat_control, p_control))

# Identify normality
if p_control > 0.05:
  print('Test group is likely to normal distribution')
else:
  print('Test group is NOT likely to normal distribution')

The first Shapiro test did not find statistical evidence of normality in the distribution. However, the second Shapiro test confirmed the normality of the distribution in the test group. In this case, for the U-test, it is not a problem. It is capable of comparing both normal and non-normal distributions. We do not need to be concerned about the variance in this case.

The hypotheses will be:

H₀: The medians of the 'Average Purchase Value' metric in the control and test groups are the same.

Hₐ: The medians of the 'Average Purchase Value' metric differ between the control and test groups.


              123456789101112131415161718192021222324
            
# Import libraries
import pandas as pd
from scipy.stats import mannwhitneyu

# Read .csv
df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';')
df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';')

# Define the metric
df_test['Average Purchase Value'] = df_test['Earning'] / df_test['Purchase']
df_control['Average Purchase Value'] = df_control['Earning'] / df_control['Purchase']

# Do the U-Test
statistic, p_value = mannwhitneyu(df_control['Average Purchase Value'], df_test['Average Purchase Value'])

# Result of the U-Test
print('Statistic:', statistic)
print('p-value:', p_value)

# Idenify the equals of the medians
if p_value > 0.05:
  print('The medians of the two groups are NOT statistically different')
else:
  print('The medians of the two groups are statistically different')

The obtained test statistic value and the low p-value indicate a statistically significant difference between the medians of the 'Average Purchase Value' metric in the control and test groups. We have sufficient evidence to reject the null hypothesis and accept the alternative hypothesis that the medians are different. The median in the test group is larger.

Everything was clear?

Thanks for your feedback!

Section 5. Chapter 7

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 3.23

Swipe to show menu