course content

Course Content

R Introduction: Part I

IntervalsIntervals

To categorize numerical data into groups, you can use the cut() function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

The cut() function in R allows you to divide numerical data into categorical factors. Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

  • x is the numerical vector to be categorized.
  • breaks can be an integer specifying the number of intervals, or a vector of cut points.
  • labels provide names for the categories.
  • right indicates if the intervals should be closed on the right.
  • ordered_result determines if the resulting factors should have an order.

To create three categories, set breaks to 3 or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

For our example of categorizing height: We choose c(0, 160, 190, 250) for breaks to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result to TRUE to define a logical order among categories (e.g., small < medium < tall).

Task

  • Given a vector of numerical grades, here's how to categorize them as factor levels:
    • [0;60) - F
    • [60;75) - D
    • [75;85) - C
    • [85;95) - B
    • [95;100) - A
  • Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals.
    • breaks - c(0, 60, 75, 85, 95, 100)
    • labels - c('F', 'D', 'C', 'B', 'A')
    • ordered_result - T (to order the factor values)
    • right - F (to include the left boundary of an interval, not right)
  • Output the contents of grades_f.

Everything was clear?

Section 3. Chapter 5
toggle bottom row