Violin and Swarm Plots
About violin plots
Let's talk about sample variances. The measure of scattering is well visualized by the violin plot.
It is similar in use to boxplot. Consider an example from life. Let's compare the data on annual incomes in the USA and Canada in 2020:
The graph tells us that they are quite close.
The white dot in the center of the graph indicates the median of the distribution.
The bolder part of the line means the first quantile (bottom) and the third quantile (top). Anything outside the horizontal line is an outlier. Now let's compare the data on annual incomes in the USA and Brazil in 2020:
The graph tells us that they are quite close. In this graph, the distributions are clearly different. The violin plot for income in Brazil is below.
Let's build a violin plot of the 'Impression'
columns for the test and control groups:
1234567891011121314151617181920212223242526272829# Import libraries import matplotlib.pyplot as plt import pandas as pd import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define colors for graphs colors_list = ['#ff8a00', '#33435c'] # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat the dataframes df_combined = pd.concat([df_control, df_test]) # Plotting violin plots sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
About swarm plots
The swarm plot goes well with the violin plot. Let's look at their combination:
1234567891011121314151617181920212223242526272829303132# Import libraries import matplotlib.pyplot as plt import pandas as pd import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define colors for graphs colors_list = ['#ff8a00', '#33435c'] # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat the dataframes df_combined = pd.concat([df_control, df_test]) # Plotting violin plots sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list) # Plotting swarm plots sns.swarmplot(data=df_combined, x='group', y='Impression', color="r", alpha=0.8) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
Now we have a visual representation of data scatter. But are these variances equal? Alas, we cannot draw such a conclusion by looking only at the graphs. As you might have guessed, statistics have a tool to check. But first, practice time!
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen
Awesome!
Completion rate improved to 3.23
Violin and Swarm Plots
Swipe um das Menü anzuzeigen
About violin plots
Let's talk about sample variances. The measure of scattering is well visualized by the violin plot.
It is similar in use to boxplot. Consider an example from life. Let's compare the data on annual incomes in the USA and Canada in 2020:
The graph tells us that they are quite close.
The white dot in the center of the graph indicates the median of the distribution.
The bolder part of the line means the first quantile (bottom) and the third quantile (top). Anything outside the horizontal line is an outlier. Now let's compare the data on annual incomes in the USA and Brazil in 2020:
The graph tells us that they are quite close. In this graph, the distributions are clearly different. The violin plot for income in Brazil is below.
Let's build a violin plot of the 'Impression'
columns for the test and control groups:
1234567891011121314151617181920212223242526272829# Import libraries import matplotlib.pyplot as plt import pandas as pd import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define colors for graphs colors_list = ['#ff8a00', '#33435c'] # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat the dataframes df_combined = pd.concat([df_control, df_test]) # Plotting violin plots sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
About swarm plots
The swarm plot goes well with the violin plot. Let's look at their combination:
1234567891011121314151617181920212223242526272829303132# Import libraries import matplotlib.pyplot as plt import pandas as pd import seaborn as sns # Read .csv files df_control = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_control.csv', delimiter=';') df_test = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/c3b98ad3-420d-403f-908d-6ab8facc3e28/ab_test.csv', delimiter=';') # Define colors for graphs colors_list = ['#ff8a00', '#33435c'] # Add to the dataframes columns-labels, which mean belonging to either the control or the test group df_control['group'] = 'Contol group' df_test['group'] = 'Test group' # Concat the dataframes df_combined = pd.concat([df_control, df_test]) # Plotting violin plots sns.violinplot(data=df_combined, x='group', y='Impression', palette=colors_list) # Plotting swarm plots sns.swarmplot(data=df_combined, x='group', y='Impression', color="r", alpha=0.8) # Sign the axes plt.xlabel('') plt.ylabel('Impression') plt.title('Comparison of Impressions') # Show the results plt.show()
Now we have a visual representation of data scatter. But are these variances equal? Alas, we cannot draw such a conclusion by looking only at the graphs. As you might have guessed, statistics have a tool to check. But first, practice time!
Danke für Ihr Feedback!