Summary  
This chapter covers how to extract specific columns and apply conditional filters to rows in tabular data, enabling focused manipulation of only the variables and observations you need.

General domain of usage  
Data analysis

When working with real-world datasets, you are often faced with more information than you actually need. Datasets can include dozens or even hundreds of columns, and many rows may not be relevant to your analysis goals. Focusing on the variables and observations that matter most allows you to streamline your workflow, improve performance, and make your results clearer and more reliable. Selecting only the necessary columns and filtering for specific rows are essential steps in preparing your data for meaningful analysis.

library(dplyr)

# Sample data frame
data <- tibble::tibble(
  name = c("Alice", "Bob", "Charlie", "David"),
  age = c(25, 30, 35, 40),
  city = c("New York", "Los Angeles", "Chicago", "Houston"),
  score = c(88, 92, 95, 85)
)

# Select only the name and score columns, and filter for rows where score > 90
filtered_data <- data %>%
  select(name, score) %>%
  filter(score > 90)

print(as.data.frame(filtered_data))

In the code above, you first use the `select()` function to choose only the columns you want — in this case, `name` and `score`. This helps reduce clutter and keeps your data focused on the variables of interest. Next, you use the `filter()` function to keep only the rows where the `score` column is greater than 90. The order of these operations is important: by selecting columns before filtering, you ensure that only the necessary variables are involved in the logical condition. Logical conditions in `filter()` use operators like `>`, `<`, `==`, and `!=`, and you must reference column names exactly as they appear in your data.

Be careful not to confuse `=` with `==` when writing logical conditions inside `filter()`. Use `==` to test for equality (for example, `filter(city == "Chicago")`). Accidentally using `=` will result in an error or unintended behavior. Also, double-check your column names for typos, as `select()` and `filter()` require exact matches. Misspelled column names will cause your code to fail or return unexpected results.

Note

What is the main difference between `select()` and `filter()` in dplyr?

Master practical data science in R by learning data cleaning, modeling, evaluation, and machine learning workflows through hands-on code. Build fluency with R syntax, functions, and outputs for real-world data science tasks.

Learn to wrangle, clean, and prepare data in R using practical, code-driven workflows.

Engineer features and reshape data for modeling using R’s tidyverse tools.

Fit, interpret, and use regression and classification models with R code.

Evaluate models and build simple machine learning pipelines in R.

Selecting and Filtering Data