Summary
This chapter explains how to group data and compute aggregated metrics—such as mean, median, and count—using data manipulation functions in a pipeline.

General domain of usage
Sales analytics

Understanding **aggregated metrics** is essential for anyone working with data. When you analyze datasets, you often need to summarize information by calculating averages, counts, or other statistics that help you make informed decisions. These aggregated metrics allow you to see patterns, compare groups, and draw conclusions that would be hidden in raw, row-level data.

library(dplyr)

# Sample sales data
sales_data <- data.frame(
  category = c("A", "A", "B", "B", "B", "C"),
  sales = c(100, 150, 200, 220, 180, 300)
)

# Calculate mean and median sales per product category
summary <- sales_data %>%
  group_by(category) %>%
  summarise(
    mean_sales = mean(sales),
    median_sales = median(sales)
  )

library(knitr)
kable(summary)

In this example, you group the sales data by `category` and then use the `summarise()` function to calculate both the mean and median sales for each product category. The **mean** gives you the average sales value, while the **median** shows the middle value when sales are ordered. Both metrics help you understand typical sales amounts, but the median is less affected by unusually high or low sales, making it useful when your data has outliers.

# Count the number of sales per category
count_summary <- sales_data %>%
  group_by(category) %>%
  summarise(
    sales_count = n()
  )

kable(count_summary)

The `n()` function, used inside `summarise()`, counts the number of rows in each group. In the previous code, `sales_count` tells you how many sales entries exist for each product category. This is especially helpful for understanding how much data you have in each group, which can affect the reliability of your aggregated metrics.

An **aggregated metric** is a summary statistic calculated from grouped data. Common aggregated metrics in analytics include **mean**, **median**, **sum**, **count**, **minimum**, and **maximum**. These metrics help you summarize and compare different groups within your data.

Definition

How do you calculate the number of items in each group using dplyr?

Why might you want to calculate both mean and median for a group?

Master the essential data manipulation skills in R using dplyr, grouping and aggregation, and joining data frames. This course is designed for beginners and provides hands-on, real-world tasks to build your confidence in transforming and analyzing data for analytics.

Learn the foundational verbs of dplyr for selecting, filtering, mutating, arranging, and summarizing data. Build confidence in transforming data frames for analytics.

Master the use of pipes (%>%) to chain multiple data manipulation steps for readable and efficient R code.

Learn to group data and calculate aggregated metrics for powerful reporting and analytics.

Master the art of combining data from multiple sources using different types of joins in dplyr.

Calculating Aggregated Metrics

1. What is an aggregated metric?

2. How do you calculate the number of items in each group using dplyr?

3. Why might you want to calculate both mean and median for a group?