Swipe to show menu

Definition

Slicing refers to selecting specific rows and/or columns from a data frame based on their positions or indexes. Subsetting is a broader concept that includes slicing, but also covers selecting data based on logical conditions or criteria. Both techniques allow you to focus your analysis on relevant parts of your data.

To select rows and columns using slicing, you use the bracket notation: data_frame[rows, columns]. For example, df[1:5, ] selects the first five rows, while df[, c("col1", "col3")] selects only the specified columns. Logical conditions can be used for more targeted subsetting: df[df$age > 30, ] returns all rows where the value in the age column is greater than 30. You can also combine row slicing and column selection, such as df[df$score >= 80, c("name", "score")] to extract the names and scores of high-performing entries.


              12345678910
            
# Sample data frame
df <- data.frame(
  name = c("Alice", "Bob", "Carol", "David", "Eve"),
  age = c(25, 34, 28, 42, 23),
  score = c(88, 92, 76, 85, 90)
)

# Subset: Select rows where age is over 30 and only the name and score columns
subset_df <- df[df$age > 30, c("name", "score")]
print(subset_df)

When subsetting, always try to use clear, readable code. Prefer logical conditions over hard-coded row numbers, especially when working with real-world data that may change. Avoid chaining too many subsetting operations, as this can make your code harder to debug. Assign the result of your subsetting to a new variable to preserve your original data. For large data frames, test your conditions on a small sample first to ensure accuracy and efficiency.

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 21

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Slicing and Subsetting Data

Definition


              12345678910
            
# Sample data frame
df <- data.frame(
  name = c("Alice", "Bob", "Carol", "David", "Eve"),
  age = c(25, 34, 28, 42, 23),
  score = c(88, 92, 76, 85, 90)
)

# Subset: Select rows where age is over 30 and only the name and score columns
subset_df <- df[df$age > 30, c("name", "score")]
print(subset_df)

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 21