Impara Combining Multiple dplyr Verbs | Pipes and Chaining Operations

Scorri per mostrare il menu

When you work with real-world data, you rarely need just one operation to get the results you want. Instead, you often perform a series of steps: selecting relevant columns, filtering rows, creating new variables, sorting, and summarizing. Combining these steps into a single, readable workflow is essential for efficient analysis. The dplyr package in R is designed for exactly this purpose, letting you chain together multiple verbs to build powerful data manipulation pipelines.


              1234567891011121314151617
            
library(dplyr)

# Example product data frame
products <- data.frame(
  product_id = 1:5,
  name = c("Widget", "Gadget", "Doodad", "Thingamajig", "Contraption"),
  price = c(25, 40, 10, 60, 35),
  stock = c(100, 0, 50, 5, 20)
)
print(products)

# Chaining select(), filter(), and mutate()
cleaned_products <- products %>%
  select(product_id, name, price, stock) %>%
  filter(stock > 10) %>%
  mutate(in_stock_value = price * stock)
print(cleaned_products)

In this workflow, you start by selecting just the columns you need: product_id, name, price, and stock. Next, you filter the data to keep only those products with stock greater than 10, removing items that are out of stock or nearly depleted. Finally, you enrich the data by creating a new column, in_stock_value, which calculates the total value of each product's current inventory. This logical order—select, filter, then mutate—mirrors how you often approach data cleaning and enrichment tasks in practice.


              123456789
            
# Chaining arrange() and summarise() for a summary report
summary_report <- products %>%
  arrange(desc(price)) %>%
  summarise(
    total_products = n(),
    average_price = mean(price),
    total_stock = sum(stock)
  )
print(summary_report)

By chaining arrange() and summarise(), you can quickly produce concise summary reports. In this example, you first sort the products by price in descending order, then generate a summary with the total number of products, the average price, and the total stock. Chaining these verbs with pipes makes your code easier to read and avoids cluttering your workspace with unnecessary intermediate variables. This approach is especially valuable when creating reports or dashboards that require clean, step-by-step data transformations.

Definition

In data analysis, a workflow refers to the sequence of steps you use to transform raw data into meaningful results. The dplyr package supports workflows by allowing you to chain multiple operations together, making your analysis both efficient and easy to follow.

1. Why is it beneficial to combine multiple dplyr verbs in a single pipeline?

2. What is a typical sequence of dplyr verbs for cleaning and summarizing data?

3. How does chaining operations help avoid intermediate variables?

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 2. Capitolo 2

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Sezione 2. Capitolo 2