Feature Engineering for Cohort Analysis
Swipe to show menu
Feature engineering is the process of creating new variables from raw data to improve analysis, modeling, or segmentation. In cohort analysis, effective feature engineering helps you extract deeper insights about user behavior over time. Typical features include user lifetime (how long a user has been active), activity counts (how many times a user has performed a specific action), and recency (how recently a user was active). These features allow you to group users more meaningfully, revealing patterns in retention, engagement, and churn. By engineering such features, you can go beyond basic cohort assignment and build richer, more actionable cohorts.
12345678910111213141516171819202122232425262728import pandas as pd # Sample user activity data data = { "user_id": [1, 1, 1, 2, 2, 3, 3, 3, 3], "activity_date": [ "2024-01-01", "2024-01-10", "2024-02-01", "2024-01-05", "2024-02-20", "2024-01-03", "2024-01-10", "2024-01-20", "2024-03-01" ] } df = pd.DataFrame(data) df["activity_date"] = pd.to_datetime(df["activity_date"]) # Calculate user lifetime (days between first and last activity) user_lifetime = df.groupby("user_id")["activity_date"].agg(["min", "max"]) user_lifetime["user_lifetime_days"] = (user_lifetime["max"] - user_lifetime["min"]).dt.days # Calculate activity count per user activity_counts = df.groupby("user_id").size().rename("activity_count") # Calculate recency (days since last activity, assuming analysis date is 2024-03-15) analysis_date = pd.to_datetime("2024-03-15") recency = df.groupby("user_id")["activity_date"].max().apply(lambda x: (analysis_date - x).days).rename("recency_days") # Combine features into a single DataFrame features = pd.concat([user_lifetime["user_lifetime_days"], activity_counts, recency], axis=1) print(features)
The features created in the code sample - user lifetime, activity counts, and recency - are powerful tools for cohort segmentation and analysis. By measuring how long a user remains active, how frequently they engage, and how recently they interacted, you can identify meaningful differences between cohorts. For instance, users with long lifetimes and frequent activity may belong to highly engaged cohorts, while those with high recency values could be at risk of churn. These engineered features enable you to move beyond simple time-based grouping, allowing for multi-dimensional segmentation that uncovers deeper behavioral patterns and supports more targeted business strategies.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat