Descriptive Statistics
Understanding your data begins with descriptive statistics - these provide essential summaries about the distribution, central tendency, and spread of variables.
Basic Descriptive Statistics
The most common statistical measures are:
- Mean: average value;
- Standard deviation: how much values deviate from the mean;
- Median: middle value;
- Min / max: smallest and largest values.
These give a quick overview of how your variables are distributed.
Base R
Base R provides simple functions for calculating descriptive statistics. The summary()
function also produces a quick statistical overview of all numeric columns.
mean(df$max_power, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
min(df$mileage, na.rm = TRUE)
max(df$mileage, na.rm = TRUE)
summary(df)
dplyr
With dplyr
, you can use summarise()
to calculate multiple statistics at once in a clean and readable format.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
What is the difference between mean, median, and mode?
Can you explain how to interpret standard deviation and variance?
How do I use dplyr to calculate descriptive statistics for grouped data?
Awesome!
Completion rate improved to 4
Descriptive Statistics
Swipe to show menu
Understanding your data begins with descriptive statistics - these provide essential summaries about the distribution, central tendency, and spread of variables.
Basic Descriptive Statistics
The most common statistical measures are:
- Mean: average value;
- Standard deviation: how much values deviate from the mean;
- Median: middle value;
- Min / max: smallest and largest values.
These give a quick overview of how your variables are distributed.
Base R
Base R provides simple functions for calculating descriptive statistics. The summary()
function also produces a quick statistical overview of all numeric columns.
mean(df$max_power, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
min(df$mileage, na.rm = TRUE)
max(df$mileage, na.rm = TRUE)
summary(df)
dplyr
With dplyr
, you can use summarise()
to calculate multiple statistics at once in a clean and readable format.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Thanks for your feedback!