Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Violin and Swarm Plots | Variances in A/B Testing
The Art of A/B Testing
course content

Course Content

The Art of A/B Testing

The Art of A/B Testing

1. What is A/B testing?
2. Normality Check
3. Variances in A/B Testing
4. T-Test
5. U-Test

bookViolin and Swarm Plots

About violin plots

Let's talk about sample variances. The measure of scattering is well visualized by the violin plot.

It is similar in use to boxplot. Consider an example from life. Let's compare the data on annual incomes in the USA and Canada in 2020:

The graph tells us that they are quite close.

The white dot in the center of the graph indicates the median of the distribution.

The bolder part of the line means the first quantile (bottom) and the third quantile (top). Anything outside the horizontal line is an outlier. Now let's compare the data on annual incomes in the USA and Brazil in 2020:

The graph tells us that they are quite close. In this graph, the distributions are clearly different. The violin plot for income in Brazil is below. Let's build a violin plot of the 'Impression' columns for the test and control groups:

1234567891011121314151617181920212223242526272829
# Import libraries import matplotlib.pyplot as plt import pandas as pd import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define colors for graphs colors_list = ['#ff8a00', '#33435c'] # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat the dataframes df_combined = pd.concat([df_control, df_test]) # Plotting violin plots sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
copy

About swarm plots

The swarm plot goes well with the violin plot. Let's look at their combination:

1234567891011121314151617181920212223242526272829303132
# Import libraries import matplotlib.pyplot as plt import pandas as pd import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define colors for graphs colors_list = ['#ff8a00', '#33435c'] # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat the dataframes df_combined = pd.concat([df_control, df_test]) # Plotting violin plots sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list) # Plotting swarm plots sns.swarmplot(data=df_combined, x='group', y='Impression', color="r", alpha=0.8) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
copy

Now we have a visual representation of data scatter. But are these variances equal? Alas, we cannot draw such a conclusion by looking only at the graphs. As you might have guessed, statistics have a tool to check. But first, practice time!

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 1
some-alt