Do you remember the key difference between numerical and factor variables? It is the limitation on number of the possible values. Therefore the height of a person is a numerical variable. But in one of the previous chapters, we mentioned such a classification as tall, medium, small. These are categorical values. So, how can we 'split' numerical values into factor values?

You can do this by using the cut() function. This function has several important parameters.

These are not the complete list of parameters, but they are essential for us.

  • x - is a vector of numerical values we want to transform.
  • breaks - either integer number (greater than or equal to 2) of intervals or a numeric vector with cut points.
  • labels - vector of names for categories.
  • right - logical, should the interval be closed on the right.
  • ordered_result - logical, should the result be ordered.

If you want to cut the data into three intervals, then the breaks parameter should be either 3 or a vector with 4 values (let them be a, b, c, d; you need to build 3 intervals, (a,b], (b,c], (c,d]) For example, we can call a person tall if their height is greater than 190 cm, small if less than 160 cm, and medium otherwise. Assume we have data on people's heights. Let's represent how the function works.

Note, that we set c(0, 160, 190, 250) as breaks parameter since we want to cut all the values into three groups: (0, 160], (160, 190], and (190, 250]. Also, we set the ordered_result parameter to T to establish a relationship between factor values (i.e., small is less than medium and less than tall).


Given a vector of grades named grades on a scale 0-100. You need to convert them into factors by using the rule:

  • [0;60) - F
  • [60;75) - D
  • [75;85) - C
  • [85;95) - B
  • [95;100) - A

Follow the next steps:

  1. Create a new variable, grades_f, and assign the result of factoring the numerical grades. Use the following parameters:
  • breaks - c(0, 60, 75, 85, 95, 100)
  • labels - c('F', 'D', 'C', 'B', 'A')
  • ordered_result - T (to order the factor values)
  • right - F (to include the left boundary of an interval, not right)

Output the grades_f variable.

Everything was clear?

Section 3. Chapter 5
toggle bottom row