Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Mapping the Lifecycle of a SpaceStream Explorer | Advanced Cohort Segmentation and Retention Metrics
Cohort Analysis with Python
Section 2. Chapter 3
single

single

Challenge: Mapping the Lifecycle of a SpaceStream Explorer

Swipe to show menu

You are now tasked with calculating advanced retention metrics for SpaceStream, an intergalactic holovision service. As the Lead Data Analyst, you will analyze a cohort of 5 users over three months, tracking who stays loyal and who drifts away. Your goal is to compute three critical metrics for each month: Retention Rate, Churn Rate, and Survival Rate.

Begin by examining the provided dataset, where each user is marked as active (1) or inactive (0) for each month. The columns month_0, month_1, and month_2 represent activity across three consecutive months. Your solution will require you to use pandas to process this dataset and extract the necessary metrics for each month.

1234567891011121314151617181920212223242526
import pandas as pd data = { "user_id": [1, 2, 3, 4, 5], "month_0": [1, 1, 1, 1, 1], # Everyone starts active "month_1": [1, 0, 1, 0, 1], # 3 users active "month_2": [1, 0, 0, 0, 0], # 1 user active } df = pd.DataFrame(data) # Calculating retention rate: fraction of original cohort active in each month cohort_size = len(df) retention_rate = [df[f"month_{i}"].sum() / cohort_size for i in range(3)] # Calculating churn rate: 1 - retention rate churn_rate = [1 - r for r in retention_rate] # Calculating survival rate: fraction of users still active in ALL months up to i survival_rate = [] for i in range(3): still_active = df[[f"month_{j}" for j in range(i + 1)]].all(axis=1).sum() survival_rate.append(still_active / cohort_size) print("retention_rate:", retention_rate) print("churn_rate:", churn_rate) print("survival_rate:", survival_rate)

This code calculates the required metrics for each month. The retention rate measures what fraction of the original cohort is active in a given month. The churn rate is simply one minus the retention rate, indicating the proportion that is no longer active. The survival rate checks for users who have remained continuously active from the beginning up to the current month - requiring a user to have a 1 in every month so far.

Task

Swipe to start coding

Write a Python function called calculate_cohort_metrics(df) that takes in a DataFrame with the same structure as above and returns three lists: retention_rate, churn_rate, and survival_rate for each month. Your function should:

  • Accept a DataFrame where each row is a user and each column after user_id is a month (e.g., month_0, month_1, ...).
  • Calculate retention rate for each month as the fraction of cohort users active in that month.
  • Calculate churn rate for each month as one minus the retention rate.
  • Calculate survival rate for each month as the fraction of users who were active in all months up to and including that month.
  • Return the three lists in the order: retention_rate, churn_rate, survival_rate.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 3
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt