Data Selection - Advanced Techniques
You already know how to select single rows and columns using basic indexing. Now, it's time to go a step further and explore how to select multiple rows and columns using both base R and the dplyr
package. These techniques are essential when you want to focus on specific parts of a dataset or prepare your data for further analysis.
Selecting Multiple Columns
Base R
You can select multiple columns by combining their positions or names with the c()
function. The result is a smaller data frame containing only the specified columns.
Using column positions:
selected_data_base <- df[, c(1, 2, 3)]
Using column names:
selected_data_base <- df[, c("name", "selling_price", "transmission")]
dplyr
You can use the select()
function and pass the column names directly.
selected_data_dplyr <- df %>%
select(km_driven, fuel, transmission)
Indexing Single Values
To access a specific value, provide both the row and column numbers. This is useful when checking or debugging individual data points.
df[1, 2] # accesses the value in row 1, column 2
Slicing Rows
Sometimes you only want to work with the first few rows, or specific rows by position.
Base R
You can select multiple rows by specifying the first and the last index and writing a :
in between.
first_5_rows_base <- df[1:5, ]
dplyr
You can use the slice()
function and pass it the range of rows you want to take.
first_5_rows_dplyr <- df %>%
slice(1:5)
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 4
Data Selection - Advanced Techniques
Swipe to show menu
You already know how to select single rows and columns using basic indexing. Now, it's time to go a step further and explore how to select multiple rows and columns using both base R and the dplyr
package. These techniques are essential when you want to focus on specific parts of a dataset or prepare your data for further analysis.
Selecting Multiple Columns
Base R
You can select multiple columns by combining their positions or names with the c()
function. The result is a smaller data frame containing only the specified columns.
Using column positions:
selected_data_base <- df[, c(1, 2, 3)]
Using column names:
selected_data_base <- df[, c("name", "selling_price", "transmission")]
dplyr
You can use the select()
function and pass the column names directly.
selected_data_dplyr <- df %>%
select(km_driven, fuel, transmission)
Indexing Single Values
To access a specific value, provide both the row and column numbers. This is useful when checking or debugging individual data points.
df[1, 2] # accesses the value in row 1, column 2
Slicing Rows
Sometimes you only want to work with the first few rows, or specific rows by position.
Base R
You can select multiple rows by specifying the first and the last index and writing a :
in between.
first_5_rows_base <- df[1:5, ]
dplyr
You can use the slice()
function and pass it the range of rows you want to take.
first_5_rows_dplyr <- df %>%
slice(1:5)
Thanks for your feedback!