Glissez pour afficher le menu

When working with real-world datasets, you will often encounter messy or inconsistent data that must be cleaned and transformed before analysis. Common data cleaning tasks include renaming columns to more meaningful names; handling missing values to ensure accurate calculations; and recoding variables to standardize categories or create new ones. These steps are essential for making your data tidy and analysis-ready.


              123456789101112131415161718
            
# Load required libraries
library(dplyr)

# Example data frame
df <- data.frame(
  id = 1:4,
  score = c(90, NA, 75, 88),
  group = c("A", "B", "A", "B")
)

# Use mutate to create a new variable and replace NA values in 'score'
df_clean <- df %>%
  mutate(
    score_clean = ifelse(is.na(score), 0, score), # Replace NA with 0
    passed = score_clean >= 80                    # Create new logical variable
  )

print(df_clean)

To reshape your data for different analysis needs, the tidyr package provides powerful tools. The pivot_longer function transforms data from a wide format, where columns represent variables, to a long format, where each row is an observation-variable pair. Conversely, pivot_wider converts long-format data back to wide format, spreading key-value pairs across multiple columns. These functions make it easy to tidy your data and prepare it for further analysis.

Tout était clair ?

Merci pour vos commentaires !

Section 1. Chapitre 4

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Data Cleaning and Transformation


              123456789101112131415161718
            
# Load required libraries
library(dplyr)

# Example data frame
df <- data.frame(
  id = 1:4,
  score = c(90, NA, 75, 88),
  group = c("A", "B", "A", "B")
)

# Use mutate to create a new variable and replace NA values in 'score'
df_clean <- df %>%
  mutate(
    score_clean = ifelse(is.na(score), 0, score), # Replace NA with 0
    passed = score_clean >= 80                    # Create new logical variable
  )

print(df_clean)

Tout était clair ?

Merci pour vos commentaires !

Section 1. Chapitre 4