Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Performing Hierarchical Clustering | Section
Statistical Visualization with Seaborn

bookPerforming Hierarchical Clustering

A clustermap is a matrix plot that combines a heatmap with hierarchical clustering.

While a standard heatmap displays data in a fixed grid, a clustermap reorders the rows and columns to place similar values next to each other. The tree-like diagrams on the axes are called dendrograms, and they show how the data points are grouped.

Key Parameters

To control how the clustering works, you can use these parameters:

  • standard_scale: standardizes the data (0 for rows, 1 for columns) so that each feature has a mean of 0 and variance of 1. This is crucial when variables have different units;
  • metric: the distance measure to use (e.g., 'euclidean', 'correlation'). It determines what "similar" means;
  • method: the linkage algorithm to use (e.g., 'single', 'complete', 'average'). It determines how to group the clusters.

Example

Here is a clustermap of the Iris dataset. Notice how the species (rows) are automatically grouped together because they have similar measurements.

12345678910111213141516171819
import seaborn as sns import matplotlib.pyplot as plt # Load dataset df = sns.load_dataset('iris') # Prepare matrix (drop non-numeric column for calculation) species = df.pop("species") # Create a clustermap sns.clustermap( data=df, standard_scale=1, # Normalize columns metric='euclidean', # Measure distance method='average', # clustering method cmap='viridis', figsize=(6, 6) ) plt.show()
copy
Task

Swipe to start coding

Analyze the flight passengers data to find similarities between years.

  1. Set the style to 'ticks'. Change the background color to 'seagreen' ('figure.facecolor').
  2. Create a clustermap using the reshaped upd_df DataFrame:
    • Pass upd_df as the data.
    • Normalize the columns by setting standard_scale to 1.
    • Use the 'single' clustering method.
    • Use 'correlation' as the distance metric.
    • Display values in cells (annot=True).
    • Set the value limits: vmin=0 and vmax=10.
    • Use the 'vlag' color map.
  3. Display the plot.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 17
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

bookPerforming Hierarchical Clustering

Swipe to show menu

A clustermap is a matrix plot that combines a heatmap with hierarchical clustering.

While a standard heatmap displays data in a fixed grid, a clustermap reorders the rows and columns to place similar values next to each other. The tree-like diagrams on the axes are called dendrograms, and they show how the data points are grouped.

Key Parameters

To control how the clustering works, you can use these parameters:

  • standard_scale: standardizes the data (0 for rows, 1 for columns) so that each feature has a mean of 0 and variance of 1. This is crucial when variables have different units;
  • metric: the distance measure to use (e.g., 'euclidean', 'correlation'). It determines what "similar" means;
  • method: the linkage algorithm to use (e.g., 'single', 'complete', 'average'). It determines how to group the clusters.

Example

Here is a clustermap of the Iris dataset. Notice how the species (rows) are automatically grouped together because they have similar measurements.

12345678910111213141516171819
import seaborn as sns import matplotlib.pyplot as plt # Load dataset df = sns.load_dataset('iris') # Prepare matrix (drop non-numeric column for calculation) species = df.pop("species") # Create a clustermap sns.clustermap( data=df, standard_scale=1, # Normalize columns metric='euclidean', # Measure distance method='average', # clustering method cmap='viridis', figsize=(6, 6) ) plt.show()
copy
Task

Swipe to start coding

Analyze the flight passengers data to find similarities between years.

  1. Set the style to 'ticks'. Change the background color to 'seagreen' ('figure.facecolor').
  2. Create a clustermap using the reshaped upd_df DataFrame:
    • Pass upd_df as the data.
    • Normalize the columns by setting standard_scale to 1.
    • Use the 'single' clustering method.
    • Use 'correlation' as the distance metric.
    • Display values in cells (annot=True).
    • Set the value limits: vmin=0 and vmax=10.
    • Use the 'vlag' color map.
  3. Display the plot.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 17
single

single

some-alt