Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Handling Missing Data | Section
Practical Data Preparation in R with Tidyverse
セクション 1.  9
single

single

Handling Missing Data

メニューを表示するにはスワイプしてください

Handling missing data is a common challenge in data wrangling with the tidyverse. In R, missing values are represented by the special value NA. These NA values can arise from incomplete data collection, data entry errors, or merging datasets with non-overlapping entries. If not addressed, NA values can disrupt calculations and lead to misleading analysis results. For instance, operations like calculating the mean or sum of a vector containing NA will themselves return NA unless you explicitly handle the missing values. Recognizing and managing these missing values is essential for ensuring the accuracy and reliability of your data analysis.

123456789101112131415161718
options(crayon.enabled = FALSE) library(tidyverse) # Creating a tibble with missing values data <- tibble( name = c("Alice", "Bob", "Charlie", "Dana"), score = c(95, NA, 88, NA) ) # Detecting missing values using is.na missing_scores <- is.na(data$score) # Replacing missing values with a specific value (e.g., 0) using replace_na data_filled <- data %>% mutate(score = replace_na(score, 0)) print(missing_scores) print(data_filled)

When handling missing data in your workflow, you should consider both the source of the missingness and the impact of your chosen strategy. Common approaches include:

  • Removing rows with missing values;
  • Replacing them with a default or imputed value;
  • Leaving them as NA and using functions that can handle missing values appropriately.

The best practice is to investigate why data is missing and to document the approach you use to address it. In some cases, removing missing values may bias your results, especially if the missingness is not random. Replacing missing values with a constant, such as zero or the mean, may also introduce bias or distort the distribution of your data. Always choose a method that aligns with your analysis goals and the nature of your dataset.

question mark

Which of the following statements best describes the implications of different missing data handling strategies in R?

正しい答えを選んでください

タスク

スワイプしてコーディングを開始

Using the provided weather tibble:

  • Detect missing values in the temperature column and store the result in a variable named missing_temps.
  • Calculate the average temperature, excluding missing values (NA).
  • Replace the missing values in the temperature column with this average using replace_na and store the result in a new tibble named weather_filled.
  • Do not modify the original weather tibble.

解答

Switch to desktop実践的な練習のためにデスクトップに切り替える下記のオプションのいずれかを利用して、現在の場所から続行する
すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  9
single

single

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

some-alt