Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lernen Multi-step Analytical Pipelines with CTEs | Analytical SQL Techniques
SQL for Analytical Reports
Abschnitt 1. Kapitel 11
single

single

bookMulti-step Analytical Pipelines with CTEs

Swipe um das Menü anzuzeigen

When you need to build complex analytical reports, breaking your SQL logic into modular, readable steps is essential. Common Table Expressions (CTEs), introduced with the WITH clause, let you organize multi-step data transformations in a way that is both easy to debug and maintain. By chaining multiple CTEs, you can create analytical pipelines where each step builds on the results of the previous one. This approach is especially helpful for tasks such as data cleaning, filtering, aggregation, and advanced calculations, all within a single, readable query.

Note
Definition

A Common Table Expression (CTE) is a temporary, named result set defined within the execution scope of a single SQL query. CTEs improve query readability and maintainability by allowing you to structure complex logic into sequential building blocks.
Advantages:

  • Modularizes complex queries for clarity;
  • Makes debugging and testing individual steps easier;
  • Enables referencing intermediate results multiple times;
  • Supports recursive and multi-step analytical logic.
1234567891011121314151617181920212223242526272829
-- Multi-step analytical pipeline using CTEs WITH -- Step 1: Clean and prepare the raw sales data prepared_sales AS ( SELECT sale_id, store_id, sale_date, category, total_amount FROM sales WHERE sale_date >= '2024-01-01' ), -- Step 2: Aggregate sales by month and category monthly_category_sales AS ( SELECT DATE_TRUNC('month', sale_date) AS sales_month, category, SUM(total_amount) AS total_sales FROM prepared_sales GROUP BY sales_month, category ) -- Step 3: Final report SELECT sales_month, category, total_sales FROM monthly_category_sales ORDER BY sales_month, category;
copy

Suppose the analytics team is asked for a report showing monthly sales growth, but they want to exclude days with unusually high sales spikes (outliers) that could distort the trend. This is a classic case for a multi-step CTE pipeline: first, you filter out outlier days, then aggregate sales by month, and finally calculate the growth rate.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
-- Stepwise monthly sales growth calculation with outlier removal using CTEs WITH -- Step 1: Calculate daily sales totals daily_sales AS ( SELECT sale_date, SUM(total_amount) AS total_sales FROM sales WHERE sale_date >= '2024-01-01' GROUP BY sale_date ), -- Step 2: Identify outlier days (sales > 95th percentile) sales_stats AS ( SELECT PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total_sales) AS p95 FROM daily_sales ), filtered_sales AS ( SELECT d.sale_date, d.total_sales FROM daily_sales d CROSS JOIN sales_stats s WHERE d.total_sales <= s.p95 ), -- Step 3: Aggregate by month monthly_sales AS ( SELECT DATE_TRUNC('month', sale_date) AS sales_month, SUM(total_sales) AS total_sales FROM filtered_sales GROUP BY sales_month ), -- Step 4: Calculate month-over-month growth growth_report AS ( SELECT sales_month, total_sales, LAG(total_sales) OVER (ORDER BY sales_month) AS prev_month_sales, CASE WHEN LAG(total_sales) OVER (ORDER BY sales_month) IS NULL THEN NULL ELSE ROUND(100.0 * (total_sales - LAG(total_sales) OVER (ORDER BY sales_month)) / LAG(total_sales) OVER (ORDER BY sales_month), 2) END AS pct_growth FROM monthly_sales ) SELECT sales_month, total_sales, prev_month_sales, pct_growth FROM growth_report ORDER BY sales_month;
copy
Aufgabe

Wischen, um mit dem Codieren zu beginnen

Segment customers into "High Value", "Mid Value", and "Low Value" groups based on their total order spend, and calculate the average order value for each segment using at least two CTEs.

  • Group customers by their total order spend and assign them to a segment: "High Value" (spend >= 2000), "Mid Value" (spend >= 500), or "Low Value" (spend < 500).
  • Calculate the number of customers and the total spend for each segment.
  • Compute the average order value per segment.

Lösung

Switch to desktopWechseln Sie zum Desktop, um in der realen Welt zu übenFahren Sie dort fort, wo Sie sind, indem Sie eine der folgenden Optionen verwenden
War alles klar?

Wie können wir es verbessern?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 11
single

single

Fragen Sie AI

expand

Fragen Sie AI

ChatGPT

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

some-alt