Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Grouping Numeric Data | Factors
R Introduction: Part I
course content

Course Content

R Introduction: Part I

R Introduction: Part I

1. Basic Syntax and Operations
2. Basic Data Types and Vectors
3. Factors

Grouping Numeric Data

To categorize numeric data into groups, you can use the cut() function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

  • x is the numeric vector to be categorized;
  • breaks can be an integer specifying the number of intervals or a vector of cut points;
  • labels provide names for the categories;
  • right indicates if the intervals should be closed on the right;
  • ordered_result determines if the resulting factors should have an order.

To create three categories, set breaks to 3 or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

1234567
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable

For our example of categorizing height, we choose c(0, 160, 190, 250) for breaks to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result to TRUE to define a logical order among categories (e.g., short < medium < tall).

Task

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:
    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;
    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Task

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:
    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;
    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Everything was clear?

Section 3. Chapter 5
toggle bottom row

Grouping Numeric Data

To categorize numeric data into groups, you can use the cut() function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

  • x is the numeric vector to be categorized;
  • breaks can be an integer specifying the number of intervals or a vector of cut points;
  • labels provide names for the categories;
  • right indicates if the intervals should be closed on the right;
  • ordered_result determines if the resulting factors should have an order.

To create three categories, set breaks to 3 or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

1234567
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable

For our example of categorizing height, we choose c(0, 160, 190, 250) for breaks to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result to TRUE to define a logical order among categories (e.g., short < medium < tall).

Task

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:
    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;
    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Task

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:
    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;
    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Everything was clear?

Section 3. Chapter 5
toggle bottom row

Grouping Numeric Data

To categorize numeric data into groups, you can use the cut() function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

  • x is the numeric vector to be categorized;
  • breaks can be an integer specifying the number of intervals or a vector of cut points;
  • labels provide names for the categories;
  • right indicates if the intervals should be closed on the right;
  • ordered_result determines if the resulting factors should have an order.

To create three categories, set breaks to 3 or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

1234567
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable

For our example of categorizing height, we choose c(0, 160, 190, 250) for breaks to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result to TRUE to define a logical order among categories (e.g., short < medium < tall).

Task

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:
    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;
    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Task

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:
    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;
    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Everything was clear?

To categorize numeric data into groups, you can use the cut() function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

  • x is the numeric vector to be categorized;
  • breaks can be an integer specifying the number of intervals or a vector of cut points;
  • labels provide names for the categories;
  • right indicates if the intervals should be closed on the right;
  • ordered_result determines if the resulting factors should have an order.

To create three categories, set breaks to 3 or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

1234567
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable

For our example of categorizing height, we choose c(0, 160, 190, 250) for breaks to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result to TRUE to define a logical order among categories (e.g., short < medium < tall).

Task

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:
    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;
    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Section 3. Chapter 5
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt