Creating Histograms
Why Use Histograms?
Histograms are used to visualize the distribution of continuous (numerical) data. They show how data is spread across ranges (bins) and help to:
- Detect skewness, outliers, or gaps;
- Understand frequency distribution;
- Quickly assess if the data is normally distributed or not.
They are best used for variables like price, mileage, or age.
Histogram Syntax in ggplot2
You can create a histogram using geom_histogram()
, where the x
variable must be numeric.
ggplot(data = df, aes(x = variable)) +
geom_histogram()
The appearance of the histogram can be customized using arguments such as bins
(number of bins), fill
(bar color), color
(border color), and theme
for styling.
Example: Distribution of Selling Prices
A histogram can be used to examine how car prices are distributed across the dataset. In this example, the bars are filled with steel blue and outlined in black, while labels and a minimal theme are added for clarity.
ggplot(data = df, aes(x = selling_price)) +
geom_histogram(fill = "steelblue", color = "black") +
labs(title = "Distribution of Selling Prices",
x = "Selling Price (in PKR)",
y = "Count") +
theme_minimal()
This plot reveals the overall shape of the selling price distribution, making it easy to see whether most cars fall within a particular price range or if there are outliers at the high or low end.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain the difference between a histogram and a bar plot?
How do I choose the right number of bins for my histogram?
What does it mean if my histogram is skewed to the right or left?
Awesome!
Completion rate improved to 4
Creating Histograms
Swipe to show menu
Why Use Histograms?
Histograms are used to visualize the distribution of continuous (numerical) data. They show how data is spread across ranges (bins) and help to:
- Detect skewness, outliers, or gaps;
- Understand frequency distribution;
- Quickly assess if the data is normally distributed or not.
They are best used for variables like price, mileage, or age.
Histogram Syntax in ggplot2
You can create a histogram using geom_histogram()
, where the x
variable must be numeric.
ggplot(data = df, aes(x = variable)) +
geom_histogram()
The appearance of the histogram can be customized using arguments such as bins
(number of bins), fill
(bar color), color
(border color), and theme
for styling.
Example: Distribution of Selling Prices
A histogram can be used to examine how car prices are distributed across the dataset. In this example, the bars are filled with steel blue and outlined in black, while labels and a minimal theme are added for clarity.
ggplot(data = df, aes(x = selling_price)) +
geom_histogram(fill = "steelblue", color = "black") +
labs(title = "Distribution of Selling Prices",
x = "Selling Price (in PKR)",
y = "Count") +
theme_minimal()
This plot reveals the overall shape of the selling price distribution, making it easy to see whether most cars fall within a particular price range or if there are outliers at the high or low end.
Thanks for your feedback!