Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Plotting Cumulative Distributions | Section
Statistical Visualization with Seaborn

bookPlotting Cumulative Distributions

An ecdfplot represents the proportion or count of observations falling below each unique value in a dataset.

Compared to a histogram or density plot, it has a significant advantage: each observation is visualized directly. This means there are no bins to adjust and no smoothing parameters that might distort the data. It is often considered the most "honest" way to visualize a distribution.

Key Parameters

By default, the plot shows the proportion (0 to 1) of data smaller than X. You can change this behavior:

  • stat='count': instead of a percentage, the Y-axis shows the number of observations;
  • complementary=True: reverses the logic. Instead of showing values below the threshold, it shows values above it. This is essentially a "survival curve" (e.g., "How many penguins have a beak longer than 50mm?").

Example

Here is how complementary changes the visualization. The curve goes down instead of up.

1234567891011121314151617
import seaborn as sns import matplotlib.pyplot as plt # Load dataset df = sns.load_dataset('penguins') # Create a Complementary ECDF # This answers: "How many penguins have a flipper length GREATER than X?" sns.ecdfplot( data=df, x='flipper_length_mm', hue='species', stat='count', # Show exact number of penguins complementary=True # Curve descends from Total to 0 ) plt.show()
copy
Task

Swipe to start coding

Analyze the bill lengths of penguins to see how many of them exceed a certain length.

  1. Import pandas, seaborn, and matplotlib.pyplot.
  2. Read the penguins dataset.
  3. Create an ECDF plot:
    • Set x to 'bill_length_mm'.
    • Group by 'island' using hue.
    • Enable the "survival" mode by setting complementary=True.
    • Show absolute numbers by setting stat='count'.
    • Use the 'mako' palette.
    • Use the df variable as data.
  4. Display the plot.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 7
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

bookPlotting Cumulative Distributions

Swipe to show menu

An ecdfplot represents the proportion or count of observations falling below each unique value in a dataset.

Compared to a histogram or density plot, it has a significant advantage: each observation is visualized directly. This means there are no bins to adjust and no smoothing parameters that might distort the data. It is often considered the most "honest" way to visualize a distribution.

Key Parameters

By default, the plot shows the proportion (0 to 1) of data smaller than X. You can change this behavior:

  • stat='count': instead of a percentage, the Y-axis shows the number of observations;
  • complementary=True: reverses the logic. Instead of showing values below the threshold, it shows values above it. This is essentially a "survival curve" (e.g., "How many penguins have a beak longer than 50mm?").

Example

Here is how complementary changes the visualization. The curve goes down instead of up.

1234567891011121314151617
import seaborn as sns import matplotlib.pyplot as plt # Load dataset df = sns.load_dataset('penguins') # Create a Complementary ECDF # This answers: "How many penguins have a flipper length GREATER than X?" sns.ecdfplot( data=df, x='flipper_length_mm', hue='species', stat='count', # Show exact number of penguins complementary=True # Curve descends from Total to 0 ) plt.show()
copy
Task

Swipe to start coding

Analyze the bill lengths of penguins to see how many of them exceed a certain length.

  1. Import pandas, seaborn, and matplotlib.pyplot.
  2. Read the penguins dataset.
  3. Create an ECDF plot:
    • Set x to 'bill_length_mm'.
    • Group by 'island' using hue.
    • Enable the "survival" mode by setting complementary=True.
    • Show absolute numbers by setting stat='count'.
    • Use the 'mako' palette.
    • Use the df variable as data.
  4. Display the plot.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 7
single

single

some-alt