Intervals
To categorize numerical data into groups, you can use the cut()
function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.
The cut()
function in R allows you to divide numerical data into categorical factors. Here's how you can use it:
Among the parameters listed, these are crucial for categorizing data:
x
is the numerical vector to be categorized;breaks
can be an integer specifying the number of intervals, or a vector of cut points;labels
provide names for the categories;right
indicates if the intervals should be closed on the right;ordered_result
determines if the resulting factors should have an order.
To create three categories, set breaks
to 3
or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].
For our example of categorizing height:
We choose c(0, 160, 190, 250)
for breaks
to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result
to TRUE
to define a logical order among categories (e.g., small < medium < tall).
Tarefa
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0;60) - F;
- [60;75) - D;
- [75;85) - C;
- [85;95) - B;
- [95;100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-T
(to order the factor values);right
-F
(to include the left boundary of an interval, not right).
- Output the contents of
grades_f
.
Tudo estava claro?
Conteúdo do Curso
R Introduction: Part I
R Introduction: Part I
Intervals
To categorize numerical data into groups, you can use the cut()
function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.
The cut()
function in R allows you to divide numerical data into categorical factors. Here's how you can use it:
Among the parameters listed, these are crucial for categorizing data:
x
is the numerical vector to be categorized;breaks
can be an integer specifying the number of intervals, or a vector of cut points;labels
provide names for the categories;right
indicates if the intervals should be closed on the right;ordered_result
determines if the resulting factors should have an order.
To create three categories, set breaks
to 3
or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].
For our example of categorizing height:
We choose c(0, 160, 190, 250)
for breaks
to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result
to TRUE
to define a logical order among categories (e.g., small < medium < tall).
Tarefa
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0;60) - F;
- [60;75) - D;
- [75;85) - C;
- [85;95) - B;
- [95;100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-T
(to order the factor values);right
-F
(to include the left boundary of an interval, not right).
- Output the contents of
grades_f
.
Tudo estava claro?