Plotting Cumulative Distributions
An ecdfplot represents the proportion or count of observations falling below each unique value in a dataset.
Compared to a histogram or density plot, it has a significant advantage: each observation is visualized directly. This means there are no bins to adjust and no smoothing parameters that might distort the data. It is often considered the most "honest" way to visualize a distribution.
Key Parameters
By default, the plot shows the proportion (0 to 1) of data smaller than X. You can change this behavior:
stat='count': instead of a percentage, the Y-axis shows the number of observations;complementary=True: reverses the logic. Instead of showing values below the threshold, it shows values above it. This is essentially a "survival curve" (e.g., "How many penguins have a beak longer than 50mm?").
Example
Here is how complementary changes the visualization. The curve goes down instead of up.
1234567891011121314151617import seaborn as sns import matplotlib.pyplot as plt # Load dataset df = sns.load_dataset('penguins') # Create a Complementary ECDF # This answers: "How many penguins have a flipper length GREATER than X?" sns.ecdfplot( data=df, x='flipper_length_mm', hue='species', stat='count', # Show exact number of penguins complementary=True # Curve descends from Total to 0 ) plt.show()
Swipe to start coding
Analyze the bill lengths of penguins to see how many of them exceed a certain length.
- Import
pandas,seaborn, andmatplotlib.pyplot. - Read the penguins dataset.
- Create an ECDF plot:
- Set
xto'bill_length_mm'. - Group by
'island'usinghue. - Enable the "survival" mode by setting
complementary=True. - Show absolute numbers by setting
stat='count'. - Use the
'mako'palette. - Use the
dfvariable as data.
- Set
- Display the plot.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 4.55
Plotting Cumulative Distributions
Swipe to show menu
An ecdfplot represents the proportion or count of observations falling below each unique value in a dataset.
Compared to a histogram or density plot, it has a significant advantage: each observation is visualized directly. This means there are no bins to adjust and no smoothing parameters that might distort the data. It is often considered the most "honest" way to visualize a distribution.
Key Parameters
By default, the plot shows the proportion (0 to 1) of data smaller than X. You can change this behavior:
stat='count': instead of a percentage, the Y-axis shows the number of observations;complementary=True: reverses the logic. Instead of showing values below the threshold, it shows values above it. This is essentially a "survival curve" (e.g., "How many penguins have a beak longer than 50mm?").
Example
Here is how complementary changes the visualization. The curve goes down instead of up.
1234567891011121314151617import seaborn as sns import matplotlib.pyplot as plt # Load dataset df = sns.load_dataset('penguins') # Create a Complementary ECDF # This answers: "How many penguins have a flipper length GREATER than X?" sns.ecdfplot( data=df, x='flipper_length_mm', hue='species', stat='count', # Show exact number of penguins complementary=True # Curve descends from Total to 0 ) plt.show()
Swipe to start coding
Analyze the bill lengths of penguins to see how many of them exceed a certain length.
- Import
pandas,seaborn, andmatplotlib.pyplot. - Read the penguins dataset.
- Create an ECDF plot:
- Set
xto'bill_length_mm'. - Group by
'island'usinghue. - Enable the "survival" mode by setting
complementary=True. - Show absolute numbers by setting
stat='count'. - Use the
'mako'palette. - Use the
dfvariable as data.
- Set
- Display the plot.
Solution
Thanks for your feedback!
single