Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Creating Histograms | Data Visualization
Data Analysis with R

bookCreating Histograms

Why Use Histograms?

Histograms are used to visualize the distribution of continuous (numerical) data. They show how data is spread across ranges (bins) and help to:

  • Detect skewness, outliers, or gaps;
  • Understand frequency distribution;
  • Quickly assess if the data is normally distributed or not.

They are best used for variables like price, mileage, or age.

Histogram Syntax in ggplot2

You can create a histogram using geom_histogram(), where the x variable must be numeric.

ggplot(data = df, aes(x = variable)) +
  geom_histogram()

The appearance of the histogram can be customized using arguments such as bins (number of bins), fill (bar color), color (border color), and theme for styling.

Example: Distribution of Selling Prices

A histogram can be used to examine how car prices are distributed across the dataset. In this example, the bars are filled with steel blue and outlined in black, while labels and a minimal theme are added for clarity.

ggplot(data = df, aes(x = selling_price)) +
  geom_histogram(fill = "steelblue", color = "black") +
  labs(title = "Distribution of Selling Prices",
       x = "Selling Price (in PKR)",
       y = "Count") +
  theme_minimal()

This plot reveals the overall shape of the selling price distribution, making it easy to see whether most cars fall within a particular price range or if there are outliers at the high or low end.

question mark

What does the bins argument in geom_histogram() control?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain the difference between a histogram and a bar plot?

How do I choose the right number of bins for my histogram?

What does it mean if my histogram is skewed to the right or left?

Awesome!

Completion rate improved to 4

bookCreating Histograms

Swipe to show menu

Why Use Histograms?

Histograms are used to visualize the distribution of continuous (numerical) data. They show how data is spread across ranges (bins) and help to:

  • Detect skewness, outliers, or gaps;
  • Understand frequency distribution;
  • Quickly assess if the data is normally distributed or not.

They are best used for variables like price, mileage, or age.

Histogram Syntax in ggplot2

You can create a histogram using geom_histogram(), where the x variable must be numeric.

ggplot(data = df, aes(x = variable)) +
  geom_histogram()

The appearance of the histogram can be customized using arguments such as bins (number of bins), fill (bar color), color (border color), and theme for styling.

Example: Distribution of Selling Prices

A histogram can be used to examine how car prices are distributed across the dataset. In this example, the bars are filled with steel blue and outlined in black, while labels and a minimal theme are added for clarity.

ggplot(data = df, aes(x = selling_price)) +
  geom_histogram(fill = "steelblue", color = "black") +
  labs(title = "Distribution of Selling Prices",
       x = "Selling Price (in PKR)",
       y = "Count") +
  theme_minimal()

This plot reveals the overall shape of the selling price distribution, making it easy to see whether most cars fall within a particular price range or if there are outliers at the high or low end.

question mark

What does the bins argument in geom_histogram() control?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 3
some-alt