Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Grouping Numeric Data | Factors
R Introduction: Part I

book
Grouping Numeric Data

To categorize numeric data into groups, you can use the cut() function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

r
cut(x, breaks, labels = NULL, right = TRUE, ordered_result = FALSE, ...)

Among the parameters listed, these are crucial for categorizing data:

  • x is the numeric vector to be categorized;

  • breaks can be an integer specifying the number of intervals or a vector of cut points;

  • labels provide names for the categories;

  • right indicates if the intervals should be closed on the right;

  • ordered_result determines if the resulting factors should have an order.

To create three categories, set breaks to 3 or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

# Vector of heights
heights <- c(170, 165, 195, 172, 189, 156, 178, 198,
157, 182, 171, 184, 163, 176, 169, 153)
# Convert into factor by cutting into intervals
heights_f <- cut(heights, breaks = c(0, 160, 190, 250),
labels = c('small', 'medium', 'tall'), ordered_result = T)
heights_f # Output the factor variable
1234567
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable
copy

For our example of categorizing height, we choose c(0, 160, 190, 250) for breaks to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result to TRUE to define a logical order among categories (e.g., short < medium < tall).

Tarea

Swipe to start coding

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:

    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;

    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Solución

# Vector of grades
grades <- c(96,84,67,78,61,74,93,77,74,82,83,79,61,23,76,79,79,73,
75,81,65,85,29,79,70,76,75,80,66,72,88,87,86,93,70,88,
67,76,69,84,85,59,82,76,67,75,80,75,79,86)
# Cut the grades into five intervals
grades_f <- cut(grades, breaks = c(0, 60, 75, 85, 95, 100),
labels = c('F', 'D', 'C', 'B', 'A'),
ordered_result = T, right = F)
# Output the grades converted into factors
grades_f

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 5
# Vector of grades
grades <- c(96,84,67,78,61,74,93,77,74,82,83,79,61,23,76,79,79,73,
75,81,65,85,29,79,70,76,75,80,66,72,88,87,86,93,70,88,
67,76,69,84,85,59,82,76,67,75,80,75,79,86)
# Cut the grades into five intervals
grades_f <- ___(___, breaks = ___,
labels = ___,
ordered_result = ___, right = ___)
# Output the grades converted into factors
___

Pregunte a AI

expand
ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

some-alt