Summary  
This chapter covers how to manipulate tabular data using dplyr’s core verbs—select, filter, arrange, mutate, and summarize—and chain these operations with the pipe operator for clear, readable pipelines.  

General domain of usage  
data wrangling

When you work with data frames in R, the **dplyr** package gives you a powerful set of tools for exploring and manipulating your data. The most important **dplyr** verbs are `select`, `filter`, `arrange`, `mutate`, and `summarize`. Each verb performs a specific type of operation:

- `select`: choose specific columns from your data;
- `filter`: keep only rows that meet certain conditions;
- `arrange`: reorder rows based on column values;
- `mutate`: add new columns or transform existing ones;
- `summarize`: reduce your data to summary statistics.

These verbs allow you to quickly inspect and explore your data frames, making it easier to focus on the information that matters most.

library(dplyr)
options(crayon.enabled = FALSE)

# Creating a sample tibble
df <- tibble::tibble(
  name = c("Alice", "Bob", "Carol", "David"),
  age = c(25, 30, 22, 35),
  score = c(88, 92, 95, 85)
)

# Using select and filter to subset the tibble
result <- df %>%
  select(name, score) %>%
  filter(score > 90)

print(result)

A key feature of **dplyr** is the **pipe operator** `%>%`, which lets you chain together multiple operations in a clear, readable sequence. Instead of nesting functions inside each other, you pass the result of one operation directly into the next. This approach makes your code easier to read and understand, especially as your data wrangling tasks become more complex.

What is the main purpose of the pipe (`%>%`) operator in dplyr workflows?


library(testthat)

source("user_code.R")
test_that("high_scores exists and is a tibble", {
    expect_true(exists("high_scores"), info = "Variable 'high_scores' should exist.")
    expect_true("tbl_df" %in% class(high_scores), info = "'high_scores' should be a tibble.")
})

test_that("high_scores has only name and score columns", {
    expect_equal(colnames(high_scores), c("name", "score"),
      info = "'high_scores' should contain only 'name' and 'score' columns.")
})

test_that("high_scores contains only rows where score > 80", {
    expect_true(all(high_scores$score > 80),
      info = "All rows in 'high_scores' should have 'score' greater than 80.")
})

test_that("high_scores contains correct names and scores", {
    expected_names <- c("Anna", "Chris", "Dana")
    expected_scores <- c(90, 85, 93)
    expect_equal(high_scores$name, expected_names,
      info = "'high_scores' should contain the correct names.")
    expect_equal(high_scores$score, expected_scores,
      info = "'high_scores' should contain the correct scores.")
})

test_main.R

Gain hands-on skills in importing, cleaning, transforming, and preparing data for analysis and visualization in R. This focused course uses the Tidyverse suite to teach practical workflows for real-world data preparation, including handling missing values, data transformation, and building tidy datasets.

Exploring Data with dplyr

Ratkaisu