Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Tracking Cohort Retention | Cohort Analysis with DATE Functions
SQL for Marketing Analytics

bookTracking Cohort Retention

Retention analysis is a powerful technique for understanding how many users from each cohort continue to engage with your product or service over time. In marketing analytics, a cohort is a group of users who share a common characteristic, such as signing up in the same week. By tracking how many users from each signup cohort return in subsequent weeks, you can measure retention rates and gain insights into user engagement, product stickiness, and the long-term impact of marketing campaigns. This approach helps you answer questions like: Are users acquired in March more likely to return than those acquired in January? Which campaigns are bringing in users who stick around?

12345678910111213141516171819202122232425262728
-- Calculate weekly retention rates for each signup cohort WITH cohort_activity AS ( SELECT u.user_id, DATE_TRUNC('week', u.signup_date) AS cohort_week, DATE_TRUNC('week', e.event_timestamp) AS activity_week, EXTRACT(WEEK FROM e.event_timestamp) - EXTRACT(WEEK FROM u.signup_date) AS week_number FROM users u JOIN events e ON u.user_id = e.user_id WHERE e.event_timestamp >= u.signup_date ), cohort_sizes AS ( SELECT DATE_TRUNC('week', signup_date) AS cohort_week, COUNT(DISTINCT user_id) AS cohort_size FROM users GROUP BY cohort_week ) SELECT ca.cohort_week, ca.week_number, COUNT(DISTINCT ca.user_id) AS users_active, cs.cohort_size, ROUND(100.0 * COUNT(DISTINCT ca.user_id) / cs.cohort_size, 1) AS retention_rate_percent FROM cohort_activity ca JOIN cohort_sizes cs ON ca.cohort_week = cs.cohort_week GROUP BY ca.cohort_week, ca.week_number, cs.cohort_size ORDER BY ca.cohort_week, ca.week_number;
copy
Note
Note

Retention rates help marketers understand long-term engagement and the impact of campaigns. High retention suggests users find value and keep coming back, while low retention highlights opportunities to improve onboarding, product features, or campaign targeting.

When you interpret retention curves, you are looking at how the percentage of returning users changes over time for each cohort. Retention curves typically show a decline: the highest retention is usually in the first week after signup, with fewer users coming back in later weeks. By comparing curves for different cohorts, you can spot patternsβ€”such as whether users acquired during a particular campaign are more loyal, or if recent product updates improved engagement. A flat retention curve means users continue returning at a steady rate, while a steep drop indicates users quickly lose interest. These patterns help you identify what is working and where to focus your marketing or product efforts.

1234567891011121314151617181920212223242526272829303132333435363738
-- Pivot retention data for visualization (one row per cohort, columns for each week) WITH cohort_activity AS ( SELECT u.user_id, DATE_TRUNC('week', u.signup_date) AS cohort_week, DATE_TRUNC('week', e.event_timestamp) AS activity_week, EXTRACT(WEEK FROM e.event_timestamp) - EXTRACT(WEEK FROM u.signup_date) AS week_number FROM users u JOIN events e ON u.user_id = e.user_id WHERE e.event_timestamp >= u.signup_date ), cohort_sizes AS ( SELECT DATE_TRUNC('week', signup_date) AS cohort_week, COUNT(DISTINCT user_id) AS cohort_size FROM users GROUP BY cohort_week ), retention AS ( SELECT ca.cohort_week, ca.week_number, COUNT(DISTINCT ca.user_id) AS users_active, cs.cohort_size, ROUND(100.0 * COUNT(DISTINCT ca.user_id) / cs.cohort_size, 1) AS retention_rate_percent FROM cohort_activity ca JOIN cohort_sizes cs ON ca.cohort_week = cs.cohort_week GROUP BY ca.cohort_week, ca.week_number, cs.cohort_size ) SELECT cohort_week, MAX(CASE WHEN week_number = 0 THEN retention_rate_percent END) AS week_0, MAX(CASE WHEN week_number = 1 THEN retention_rate_percent END) AS week_1, MAX(CASE WHEN week_number = 2 THEN retention_rate_percent END) AS week_2, MAX(CASE WHEN week_number = 3 THEN retention_rate_percent END) AS week_3 FROM retention GROUP BY cohort_week ORDER BY cohort_week;
copy

This query performs cohort retention analysis using a combination of CTEs, date manipulation, aggregation, and conditional pivoting to produce a visualization-ready output.

The first CTE, cohort_activity, applies date normalization with DATE_TRUNC to align both signup dates and event timestamps to a weekly level. It joins users with their events and uses relative time calculation (EXTRACT(WEEK)) to compute week_number, which represents how many weeks after signup each activity occurred. A filter ensures that only events happening on or after signup are included, preserving correct cohort behavior.

The second CTE, cohort_sizes, uses distinct aggregation to calculate the total number of users in each signup cohort. This establishes the cohort baseline needed for retention calculations.

The retention CTE combines cohort activity with cohort sizes through a relational join, then applies grouped aggregation to count how many unique users were active in each cohort and week. It derives a percentage-based retention metric by dividing active users by the cohort size and rounding the result for reporting clarity.

Finally, the outer query transforms the long-format retention data into a pivoted layout using conditional aggregation with CASE expressions. Each cohort is represented by a single row, and each column (week_0 to week_3) corresponds to the retention rate for a specific week since signup. This structure is optimized for dashboards, cohort heatmaps, and retention trend analysis.

Task

Swipe to start coding

Write a SQL query to calculate the monthly retention rate for each signup cohort using the users and events tables. Your query should:

  • Group users into cohorts based on the month they signed up (use the start of the month with DATE_TRUNC('month', signup_date));
  • For each cohort, count the number of distinct users who were active (had any event) in each month after their signup month (month 0 is the signup month);
  • Calculate the size of each cohort (total number of users who signed up that month);
  • For each cohort and month, calculate the retention rate as the percentage of active users out of the cohort size, rounded to one decimal place;
  • Output the following columns: cohort_month, month_number, users_active, cohort_size, retention_rate_percent.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 2
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

bookTracking Cohort Retention

Swipe to show menu

Retention analysis is a powerful technique for understanding how many users from each cohort continue to engage with your product or service over time. In marketing analytics, a cohort is a group of users who share a common characteristic, such as signing up in the same week. By tracking how many users from each signup cohort return in subsequent weeks, you can measure retention rates and gain insights into user engagement, product stickiness, and the long-term impact of marketing campaigns. This approach helps you answer questions like: Are users acquired in March more likely to return than those acquired in January? Which campaigns are bringing in users who stick around?

12345678910111213141516171819202122232425262728
-- Calculate weekly retention rates for each signup cohort WITH cohort_activity AS ( SELECT u.user_id, DATE_TRUNC('week', u.signup_date) AS cohort_week, DATE_TRUNC('week', e.event_timestamp) AS activity_week, EXTRACT(WEEK FROM e.event_timestamp) - EXTRACT(WEEK FROM u.signup_date) AS week_number FROM users u JOIN events e ON u.user_id = e.user_id WHERE e.event_timestamp >= u.signup_date ), cohort_sizes AS ( SELECT DATE_TRUNC('week', signup_date) AS cohort_week, COUNT(DISTINCT user_id) AS cohort_size FROM users GROUP BY cohort_week ) SELECT ca.cohort_week, ca.week_number, COUNT(DISTINCT ca.user_id) AS users_active, cs.cohort_size, ROUND(100.0 * COUNT(DISTINCT ca.user_id) / cs.cohort_size, 1) AS retention_rate_percent FROM cohort_activity ca JOIN cohort_sizes cs ON ca.cohort_week = cs.cohort_week GROUP BY ca.cohort_week, ca.week_number, cs.cohort_size ORDER BY ca.cohort_week, ca.week_number;
copy
Note
Note

Retention rates help marketers understand long-term engagement and the impact of campaigns. High retention suggests users find value and keep coming back, while low retention highlights opportunities to improve onboarding, product features, or campaign targeting.

When you interpret retention curves, you are looking at how the percentage of returning users changes over time for each cohort. Retention curves typically show a decline: the highest retention is usually in the first week after signup, with fewer users coming back in later weeks. By comparing curves for different cohorts, you can spot patternsβ€”such as whether users acquired during a particular campaign are more loyal, or if recent product updates improved engagement. A flat retention curve means users continue returning at a steady rate, while a steep drop indicates users quickly lose interest. These patterns help you identify what is working and where to focus your marketing or product efforts.

1234567891011121314151617181920212223242526272829303132333435363738
-- Pivot retention data for visualization (one row per cohort, columns for each week) WITH cohort_activity AS ( SELECT u.user_id, DATE_TRUNC('week', u.signup_date) AS cohort_week, DATE_TRUNC('week', e.event_timestamp) AS activity_week, EXTRACT(WEEK FROM e.event_timestamp) - EXTRACT(WEEK FROM u.signup_date) AS week_number FROM users u JOIN events e ON u.user_id = e.user_id WHERE e.event_timestamp >= u.signup_date ), cohort_sizes AS ( SELECT DATE_TRUNC('week', signup_date) AS cohort_week, COUNT(DISTINCT user_id) AS cohort_size FROM users GROUP BY cohort_week ), retention AS ( SELECT ca.cohort_week, ca.week_number, COUNT(DISTINCT ca.user_id) AS users_active, cs.cohort_size, ROUND(100.0 * COUNT(DISTINCT ca.user_id) / cs.cohort_size, 1) AS retention_rate_percent FROM cohort_activity ca JOIN cohort_sizes cs ON ca.cohort_week = cs.cohort_week GROUP BY ca.cohort_week, ca.week_number, cs.cohort_size ) SELECT cohort_week, MAX(CASE WHEN week_number = 0 THEN retention_rate_percent END) AS week_0, MAX(CASE WHEN week_number = 1 THEN retention_rate_percent END) AS week_1, MAX(CASE WHEN week_number = 2 THEN retention_rate_percent END) AS week_2, MAX(CASE WHEN week_number = 3 THEN retention_rate_percent END) AS week_3 FROM retention GROUP BY cohort_week ORDER BY cohort_week;
copy

This query performs cohort retention analysis using a combination of CTEs, date manipulation, aggregation, and conditional pivoting to produce a visualization-ready output.

The first CTE, cohort_activity, applies date normalization with DATE_TRUNC to align both signup dates and event timestamps to a weekly level. It joins users with their events and uses relative time calculation (EXTRACT(WEEK)) to compute week_number, which represents how many weeks after signup each activity occurred. A filter ensures that only events happening on or after signup are included, preserving correct cohort behavior.

The second CTE, cohort_sizes, uses distinct aggregation to calculate the total number of users in each signup cohort. This establishes the cohort baseline needed for retention calculations.

The retention CTE combines cohort activity with cohort sizes through a relational join, then applies grouped aggregation to count how many unique users were active in each cohort and week. It derives a percentage-based retention metric by dividing active users by the cohort size and rounding the result for reporting clarity.

Finally, the outer query transforms the long-format retention data into a pivoted layout using conditional aggregation with CASE expressions. Each cohort is represented by a single row, and each column (week_0 to week_3) corresponds to the retention rate for a specific week since signup. This structure is optimized for dashboards, cohort heatmaps, and retention trend analysis.

Task

Swipe to start coding

Write a SQL query to calculate the monthly retention rate for each signup cohort using the users and events tables. Your query should:

  • Group users into cohorts based on the month they signed up (use the start of the month with DATE_TRUNC('month', signup_date));
  • For each cohort, count the number of distinct users who were active (had any event) in each month after their signup month (month 0 is the signup month);
  • Calculate the size of each cohort (total number of users who signed up that month);
  • For each cohort and month, calculate the retention rate as the percentage of active users out of the cohort size, rounded to one decimal place;
  • Output the following columns: cohort_month, month_number, users_active, cohort_size, retention_rate_percent.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 2
single

single

some-alt