Course Content
R Introduction: Part I
R Introduction: Part I
Grouping Numeric Data
To categorize numeric data into groups, you can use the cut()
function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.
Here's how you can use it:
Among the parameters listed, these are crucial for categorizing data:
x
is the numeric vector to be categorized;breaks
can be an integer specifying the number of intervals or a vector of cut points;labels
provide names for the categories;right
indicates if the intervals should be closed on the right;ordered_result
determines if the resulting factors should have an order.
To create three categories, set breaks
to 3
or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable
For our example of categorizing height, we choose c(0, 160, 190, 250)
for breaks
to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result
to TRUE
to define a logical order among categories (e.g., short < medium < tall).
Task
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-TRUE
(to order the factor values);right
-FALSE
(to include the left boundary of an interval, not the right).
- Output the contents of
grades_f
.
Task
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-TRUE
(to order the factor values);right
-FALSE
(to include the left boundary of an interval, not the right).
- Output the contents of
grades_f
.
Everything was clear?
Grouping Numeric Data
To categorize numeric data into groups, you can use the cut()
function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.
Here's how you can use it:
Among the parameters listed, these are crucial for categorizing data:
x
is the numeric vector to be categorized;breaks
can be an integer specifying the number of intervals or a vector of cut points;labels
provide names for the categories;right
indicates if the intervals should be closed on the right;ordered_result
determines if the resulting factors should have an order.
To create three categories, set breaks
to 3
or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable
For our example of categorizing height, we choose c(0, 160, 190, 250)
for breaks
to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result
to TRUE
to define a logical order among categories (e.g., short < medium < tall).
Task
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-TRUE
(to order the factor values);right
-FALSE
(to include the left boundary of an interval, not the right).
- Output the contents of
grades_f
.
Task
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-TRUE
(to order the factor values);right
-FALSE
(to include the left boundary of an interval, not the right).
- Output the contents of
grades_f
.
Everything was clear?
Grouping Numeric Data
To categorize numeric data into groups, you can use the cut()
function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.
Here's how you can use it:
Among the parameters listed, these are crucial for categorizing data:
x
is the numeric vector to be categorized;breaks
can be an integer specifying the number of intervals or a vector of cut points;labels
provide names for the categories;right
indicates if the intervals should be closed on the right;ordered_result
determines if the resulting factors should have an order.
To create three categories, set breaks
to 3
or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable
For our example of categorizing height, we choose c(0, 160, 190, 250)
for breaks
to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result
to TRUE
to define a logical order among categories (e.g., short < medium < tall).
Task
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-TRUE
(to order the factor values);right
-FALSE
(to include the left boundary of an interval, not the right).
- Output the contents of
grades_f
.
Task
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-TRUE
(to order the factor values);right
-FALSE
(to include the left boundary of an interval, not the right).
- Output the contents of
grades_f
.
Everything was clear?
To categorize numeric data into groups, you can use the cut()
function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.
Here's how you can use it:
Among the parameters listed, these are crucial for categorizing data:
x
is the numeric vector to be categorized;breaks
can be an integer specifying the number of intervals or a vector of cut points;labels
provide names for the categories;right
indicates if the intervals should be closed on the right;ordered_result
determines if the resulting factors should have an order.
To create three categories, set breaks
to 3
or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable
For our example of categorizing height, we choose c(0, 160, 190, 250)
for breaks
to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result
to TRUE
to define a logical order among categories (e.g., short < medium < tall).
Task
- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.
- Create a variable
grades_f
that stores the factor levels with the specified breaks and labels, considering the ordering, and useright = FALSE
to include the left boundary of the intervals;breaks
-c(0, 60, 75, 85, 95, 100)
;labels
-c('F', 'D', 'C', 'B', 'A')
;ordered_result
-TRUE
(to order the factor values);right
-FALSE
(to include the left boundary of an interval, not the right).
- Output the contents of
grades_f
.