Learn Violin and Swarm Plots | Variances in A/B Testing

About violin plots

Let's talk about sample variances. The measure of scattering is well visualized by the violin plot.

It is similar in use to boxplot. Consider an example from life. Let's compare the data on annual incomes in the USA and Canada in 2020:

The graph tells us that they are quite close.

The white dot in the center of the graph indicates the median of the distribution.

The bolder part of the line means the first quantile (bottom) and the third quantile (top). Anything outside the horizontal line is an outlier. Now let's compare the data on annual incomes in the USA and Brazil in 2020:

The graph tells us that they are quite close. In this graph, the distributions are clearly different. The violin plot for income in Brazil is below. Let's build a violin plot of the 'Impression' columns for the test and control groups:


              1234567891011121314151617181920212223242526272829
            
# Import libraries
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Read .csv files
df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';')
df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';')

# Define colors for graphs
colors_list = ['#ff8a00', '#33435c']

# Add to the dataframes columns-labels, which mean belonging to either the control or the test group
df_control['group'] = 'Contol group'
df_test['group'] = 'Test group'

# Concat the dataframes
df_combined = pd.concat([df_control, df_test])

# Plotting violin plots
sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list)

# Sign the axes
plt.xlabel('')
plt.ylabel('Impression')
plt.title('Comparison of Impressions')

# Show the results
plt.show()

About swarm plots

The swarm plot goes well with the violin plot. Let's look at their combination:


              1234567891011121314151617181920212223242526272829303132
            
# Import libraries
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Read .csv files
df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';')
df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';')

# Define colors for graphs
colors_list = ['#ff8a00', '#33435c']

# Add to the dataframes columns-labels, which mean belonging to either the control or the test group
df_control['group'] = 'Contol group'
df_test['group'] = 'Test group'

# Concat the dataframes
df_combined = pd.concat([df_control, df_test])

# Plotting violin plots
sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list)

# Plotting swarm plots
sns.swarmplot(data=df_combined, x='group', y='Impression', color="r", alpha=0.8)

# Sign the axes
plt.xlabel('')
plt.ylabel('Impression')
plt.title('Comparison of Impressions')

# Show the results
plt.show()

Now we have a visual representation of data scatter. But are these variances equal? Alas, we cannot draw such a conclusion by looking only at the graphs. As you might have guessed, statistics have a tool to check. But first, practice time!

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Ask me questions about this topic

Summarize this chapter

Show real-world examples

Swipe to show menu