Course Content
The Art of A/B Testing
The Art of A/B Testing
Violin and Swarm Plots
About violin plots
Let's talk about sample variances. The measure of scattering is well visualized by the violin plot.
It is similar in use to boxplot. Consider an example from life. Let's compare the data on annual incomes in the USA and Canada in 2020:
The graph tells us that they are quite close.
The white dot in the center of the graph indicates the median of the distribution.
The bolder part of the line means the first quantile (bottom) and the third quantile (top). Anything outside the horizontal line is an outlier. Now let's compare the data on annual incomes in the USA and Brazil in 2020:
The graph tells us that they are quite close. In this graph, the distributions are clearly different. The violin plot for income in Brazil is below.
Let's build a violin plot of the 'Impression'
columns for the test and control groups:
# Import libraries import matplotlib.pyplot as plt import pandas as pd import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define colors for graphs colors_list = ['#ff8a00', '#33435c'] # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat the dataframes df_combined = pd.concat([df_control, df_test]) # Plotting violin plots sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
About swarm plots
The swarm plot goes well with the violin plot. Let's look at their combination:
# Import libraries import matplotlib.pyplot as plt import pandas as pd import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define colors for graphs colors_list = ['#ff8a00', '#33435c'] # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat the dataframes df_combined = pd.concat([df_control, df_test]) # Plotting violin plots sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list) # Plotting swarm plots sns.swarmplot(data=df_combined, x='group', y='Impression', color="r", alpha=0.8) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
Now we have a visual representation of data scatter. But are these variances equal? Alas, we cannot draw such a conclusion by looking only at the graphs. As you might have guessed, statistics have a tool to check. But first, practice time!
Thanks for your feedback!