Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Summarizing Data | Data Manipulation and Cleaning
Data Analysis with R

bookSummarizing Data

メニューを表示するにはスワイプしてください

Summarizing data is essential for getting a quick understanding of its structure and patterns.

Quick Summary of the Dataset

Before performing a detailed analysis, it is useful to generate a quick overview of the dataset. This helps you understand the ranges, distributions, and presence of categorical values at a glance. You can use the summary() function for this.

summary(df)

Summary Statistics for a Single Column

You can calculate basic descriptive statistics such as the mean, median, and standard deviation for individual columns. For example, here's how to summarize the selling_price column.

Base R

There are dedicated functions like mean(), median(), and sd() at your disposal. The argument na.rm = TRUE ensures that missing values are ignored during calculation.

mean(df$selling_price, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
sd(df$selling_price, na.rm = TRUE)

dplyr

You can compute all three statistics in a single step with the summarise() function.

df %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    median_price = median(selling_price, na.rm = TRUE),
    sd_price = sd(selling_price, na.rm = TRUE)
  )

Summarizing Multiple Columns by Group

Often, you'll want to compare summary statistics across different groups in your dataset. For example, you might calculate the average selling price and average mileage for each type of fuel.

Before summarizing, make sure that the mileage column is numeric:

df$mileage <- as.numeric(gsub(" km.*", "", df$mileage))
str(df$mileage)

Base R

The aggregate() function can be used to compute grouped statistics. The cbind() function allows summarizing multiple numeric columns at once.

aggregate(cbind(selling_price, mileage) ~ fuel, data = df, FUN = mean, na.rm = TRUE)

dplyr

Grouping and summarizing can also be done using group_by() and summarise(). This approach is generally more readable and easier to extend.

df %>%
  group_by(fuel) %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    mean_mileage = mean(mileage, na.rm = TRUE)
  )
question mark

aggregate() function is used in base R to:

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  11

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 1.  11
some-alt