Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Multi-step Analytical Pipelines with CTEs | Analytical SQL Techniques
SQL for Analytical Reports
Section 1. Chapter 11
single

single

bookMulti-step Analytical Pipelines with CTEs

Swipe to show menu

When you need to build complex analytical reports, breaking your SQL logic into modular, readable steps is essential. Common Table Expressions (CTEs), introduced with the WITH clause, let you organize multi-step data transformations in a way that is both easy to debug and maintain. By chaining multiple CTEs, you can create analytical pipelines where each step builds on the results of the previous one. This approach is especially helpful for tasks such as data cleaning, filtering, aggregation, and advanced calculations, all within a single, readable query.

Note
Definition

A Common Table Expression (CTE) is a temporary, named result set defined within the execution scope of a single SQL query. CTEs improve query readability and maintainability by allowing you to structure complex logic into sequential building blocks.
Advantages:

  • Modularizes complex queries for clarity;
  • Makes debugging and testing individual steps easier;
  • Enables referencing intermediate results multiple times;
  • Supports recursive and multi-step analytical logic.
1234567891011121314151617181920212223242526272829
-- Multi-step analytical pipeline using CTEs WITH -- Step 1: Clean and prepare the raw sales data prepared_sales AS ( SELECT sale_id, store_id, sale_date, category, total_amount FROM sales WHERE sale_date >= '2024-01-01' ), -- Step 2: Aggregate sales by month and category monthly_category_sales AS ( SELECT DATE_TRUNC('month', sale_date) AS sales_month, category, SUM(total_amount) AS total_sales FROM prepared_sales GROUP BY sales_month, category ) -- Step 3: Final report SELECT sales_month, category, total_sales FROM monthly_category_sales ORDER BY sales_month, category;
copy

Suppose the analytics team is asked for a report showing monthly sales growth, but they want to exclude days with unusually high sales spikes (outliers) that could distort the trend. This is a classic case for a multi-step CTE pipeline: first, you filter out outlier days, then aggregate sales by month, and finally calculate the growth rate.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
-- Stepwise monthly sales growth calculation with outlier removal using CTEs WITH -- Step 1: Calculate daily sales totals daily_sales AS ( SELECT sale_date, SUM(total_amount) AS total_sales FROM sales WHERE sale_date >= '2024-01-01' GROUP BY sale_date ), -- Step 2: Identify outlier days (sales > 95th percentile) sales_stats AS ( SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_sales) AS p95 FROM daily_sales ), filtered_sales AS ( SELECT d.sale_date, d.total_sales FROM daily_sales d CROSS JOIN sales_stats s WHERE d.total_sales <= s.p95 ), -- Step 3: Aggregate by month monthly_sales AS ( SELECT DATE_TRUNC('month', sale_date) AS sales_month, SUM(total_sales) AS total_sales FROM filtered_sales GROUP BY sales_month ), -- Step 4: Calculate month-over-month growth growth_report AS ( SELECT sales_month, total_sales, LAG(total_sales) OVER (ORDER BY sales_month) AS prev_month_sales, CASE WHEN LAG(total_sales) OVER (ORDER BY sales_month) IS NULL THEN NULL ELSE ROUND(100.0 * (total_sales - LAG(total_sales) OVER (ORDER BY sales_month)) / LAG(total_sales) OVER (ORDER BY sales_month), 2) END AS pct_growth FROM monthly_sales ) SELECT sales_month, total_sales, prev_month_sales, pct_growth FROM growth_report ORDER BY sales_month;
copy
Task

Swipe to start coding

Segment customers into "High Value", "Mid Value", and "Low Value" groups based on their total order spend, and calculate the average order value for each segment using at least two CTEs.

  • Group customers by their total order spend and assign them to a segment: "High Value" (spend >= 2000), "Mid Value" (spend >= 500), or "Low Value" (spend < 500).
  • Calculate the number of customers and the total spend for each segment.
  • Compute the average order value per segment.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 11
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

some-alt