Grouping Numeric Data | Factors
R Introduction: Part I

# Grouping Numeric Data

To categorize numeric data into groups, you can use the `cut()` function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

• `x` is the numeric vector to be categorized;
• `breaks` can be an integer specifying the number of intervals or a vector of cut points;
• `labels` provide names for the categories;
• `right` indicates if the intervals should be closed on the right;
• `ordered_result` determines if the resulting factors should have an order.

To create three categories, set `breaks` to `3` or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

For our example of categorizing height, we choose `c(0, 160, 190, 250)` for `breaks` to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set `ordered_result` to `TRUE` to define a logical order among categories (e.g., short < medium < tall).

Tarefa

1. Given a vector of numerical grades, here's how to categorize them as factor levels:
• [0, 60) - F;
• [60, 75) - D;
• [75, 85) - C;
• [85, 95) - B;
• [95, 100) - A.
2. Create a variable `grades_f` that stores the factor levels with the specified breaks and labels, considering the ordering, and use `right = FALSE` to include the left boundary of the intervals;
• `breaks` - `c(0, 60, 75, 85, 95, 100)`;
• `labels` - `c('F', 'D', 'C', 'B', 'A')`;
• `ordered_result` - `TRUE` (to order the factor values);
• `right` - `FALSE` (to include the left boundary of an interval, not the right).
3. Output the contents of `grades_f`.

Tudo estava claro?

Seção 3. Capítulo 5

Conteúdo do Curso

R Introduction: Part I

# Grouping Numeric Data

To categorize numeric data into groups, you can use the `cut()` function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

• `x` is the numeric vector to be categorized;
• `breaks` can be an integer specifying the number of intervals or a vector of cut points;
• `labels` provide names for the categories;
• `right` indicates if the intervals should be closed on the right;
• `ordered_result` determines if the resulting factors should have an order.

To create three categories, set `breaks` to `3` or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

For our example of categorizing height, we choose `c(0, 160, 190, 250)` for `breaks` to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set `ordered_result` to `TRUE` to define a logical order among categories (e.g., short < medium < tall).

Tarefa

1. Given a vector of numerical grades, here's how to categorize them as factor levels:
• [0, 60) - F;
• [60, 75) - D;
• [75, 85) - C;
• [85, 95) - B;
• [95, 100) - A.
2. Create a variable `grades_f` that stores the factor levels with the specified breaks and labels, considering the ordering, and use `right = FALSE` to include the left boundary of the intervals;
• `breaks` - `c(0, 60, 75, 85, 95, 100)`;
• `labels` - `c('F', 'D', 'C', 'B', 'A')`;
• `ordered_result` - `TRUE` (to order the factor values);
• `right` - `FALSE` (to include the left boundary of an interval, not the right).
3. Output the contents of `grades_f`.

Tudo estava claro?

Seção 3. Capítulo 5