Learn Summarizing User Data | Product Metrics and Data Exploration

Python for Product Managers

Swipe to show menu

Understanding how to summarize user data is crucial for making informed product decisions. When analyzing user behavior, you often encounter large datasets—such as lists of session durations—that need to be distilled into actionable insights. Three fundamental summary statistics are the mean, median, and mode. The mean is the arithmetic average, providing a sense of the overall typical value. The median is the middle value when data is sorted, giving a robust sense of central tendency even when there are extreme values. The mode represents the most frequently occurring value, which can indicate common usage patterns. These statistics help you quickly grasp how users interact with your product and guide decisions such as which features to prioritize or where to investigate further.


              1234567891011121314151617
            
import statistics

# List of user session durations in minutes
session_durations = [5, 7, 5, 8, 10, 7, 5, 12, 20, 7]

# Calculate mean (average)
mean_duration = statistics.mean(session_durations)

# Calculate median
median_duration = statistics.median(session_durations)

# Calculate mode
mode_duration = statistics.mode(session_durations)

print("Mean session duration:", mean_duration)
print("Median session duration:", median_duration)
print("Mode session duration:", mode_duration)

When you review these summary statistics, you can better understand your users' typical experiences and behaviors. For example, if the mean session duration is much higher than the median, it may indicate that a small group of users are spending significantly more time than most, skewing the average. This could highlight opportunities to investigate what keeps those users engaged or to address why the majority spend less time. The mode can reveal the most common session length, which is useful for setting product defaults or designing new features. Overall, these metrics enable you to prioritize features that address the needs of the majority, identify areas for improvement, and set realistic benchmarks for user engagement.


              1234567891011121314
            
# Identifying outliers in session durations
# Define an outlier as any session more than 1.5 times the interquartile range (IQR) from the quartiles

sorted_sessions = sorted(session_durations)
q1 = statistics.median(sorted_sessions[:len(sorted_sessions)//2])
q3 = statistics.median(sorted_sessions[(len(sorted_sessions)+1)//2:])
iqr = q3 - q1

lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr

outliers = [x for x in session_durations if x < lower_bound or x > upper_bound]

print("Outlier session durations:", outliers)

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 1. Chapter 2

Summarizing User Data

1. What is the difference between mean and median in the context of user session times?

2. Why might outliers in user data be important for a Product Manager?

3. Which Python function can be used to find the average of a list?