Course Content

R Introduction: Part I

## R Introduction: Part I

# Grouping Numeric Data

To categorize **numeric** data into groups, you can use the `cut()`

function in R, which assigns each number to a category based on specified intervals. For instance, if you have a **continuous variable** like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

`x`

is the**numeric vector**to be categorized;`breaks`

can be an integer specifying the**number of intervals**or a vector of cut points;`labels`

provide names for the categories;`right`

indicates if the intervals should be**closed on the right**;`ordered_result`

determines if the resulting factors should have an order.

To create three categories, set `breaks`

to `3`

or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

`# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable`

For our example of categorizing height, we choose `c(0, 160, 190, 250)`

for `breaks`

to divide the data into **three** groups: (0, 160], (160, 190], and (190, 250]. We also set `ordered_result`

to `TRUE`

to define a **logical order** among categories (e.g., short < medium < tall).

Task

- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.

- Create a variable
`grades_f`

that stores the factor levels with the specified breaks and labels, considering the ordering, and use`right = FALSE`

to include the left boundary of the intervals;`breaks`

-`c(0, 60, 75, 85, 95, 100)`

;`labels`

-`c('F', 'D', 'C', 'B', 'A')`

;`ordered_result`

-`TRUE`

(to**order the factor values**);`right`

-`FALSE`

(to**include the left boundary**of an interval, not the right).

- Output the contents of
`grades_f`

.

Task

- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.

- Create a variable
`grades_f`

that stores the factor levels with the specified breaks and labels, considering the ordering, and use`right = FALSE`

to include the left boundary of the intervals;`breaks`

-`c(0, 60, 75, 85, 95, 100)`

;`labels`

-`c('F', 'D', 'C', 'B', 'A')`

;`ordered_result`

-`TRUE`

(to**order the factor values**);`right`

-`FALSE`

(to**include the left boundary**of an interval, not the right).

- Output the contents of
`grades_f`

.

Everything was clear?

# Grouping Numeric Data

To categorize **numeric** data into groups, you can use the `cut()`

function in R, which assigns each number to a category based on specified intervals. For instance, if you have a **continuous variable** like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

`x`

is the**numeric vector**to be categorized;`breaks`

can be an integer specifying the**number of intervals**or a vector of cut points;`labels`

provide names for the categories;`right`

indicates if the intervals should be**closed on the right**;`ordered_result`

determines if the resulting factors should have an order.

To create three categories, set `breaks`

to `3`

or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

`# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable`

For our example of categorizing height, we choose `c(0, 160, 190, 250)`

for `breaks`

to divide the data into **three** groups: (0, 160], (160, 190], and (190, 250]. We also set `ordered_result`

to `TRUE`

to define a **logical order** among categories (e.g., short < medium < tall).

Task

- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.

- Create a variable
`grades_f`

that stores the factor levels with the specified breaks and labels, considering the ordering, and use`right = FALSE`

to include the left boundary of the intervals;`breaks`

-`c(0, 60, 75, 85, 95, 100)`

;`labels`

-`c('F', 'D', 'C', 'B', 'A')`

;`ordered_result`

-`TRUE`

(to**order the factor values**);`right`

-`FALSE`

(to**include the left boundary**of an interval, not the right).

- Output the contents of
`grades_f`

.

Task

- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.

- Create a variable
`grades_f`

that stores the factor levels with the specified breaks and labels, considering the ordering, and use`right = FALSE`

to include the left boundary of the intervals;`breaks`

-`c(0, 60, 75, 85, 95, 100)`

;`labels`

-`c('F', 'D', 'C', 'B', 'A')`

;`ordered_result`

-`TRUE`

(to**order the factor values**);`right`

-`FALSE`

(to**include the left boundary**of an interval, not the right).

- Output the contents of
`grades_f`

.

Everything was clear?

# Grouping Numeric Data

To categorize **numeric** data into groups, you can use the `cut()`

function in R, which assigns each number to a category based on specified intervals. For instance, if you have a **continuous variable** like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

`x`

is the**numeric vector**to be categorized;`breaks`

can be an integer specifying the**number of intervals**or a vector of cut points;`labels`

provide names for the categories;`right`

indicates if the intervals should be**closed on the right**;`ordered_result`

determines if the resulting factors should have an order.

To create three categories, set `breaks`

to `3`

or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

`# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable`

For our example of categorizing height, we choose `c(0, 160, 190, 250)`

for `breaks`

to divide the data into **three** groups: (0, 160], (160, 190], and (190, 250]. We also set `ordered_result`

to `TRUE`

to define a **logical order** among categories (e.g., short < medium < tall).

Task

- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.

- Create a variable
`grades_f`

that stores the factor levels with the specified breaks and labels, considering the ordering, and use`right = FALSE`

to include the left boundary of the intervals;`breaks`

-`c(0, 60, 75, 85, 95, 100)`

;`labels`

-`c('F', 'D', 'C', 'B', 'A')`

;`ordered_result`

-`TRUE`

(to**order the factor values**);`right`

-`FALSE`

(to**include the left boundary**of an interval, not the right).

- Output the contents of
`grades_f`

.

Task

- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.

- Create a variable
`grades_f`

that stores the factor levels with the specified breaks and labels, considering the ordering, and use`right = FALSE`

to include the left boundary of the intervals;`breaks`

-`c(0, 60, 75, 85, 95, 100)`

;`labels`

-`c('F', 'D', 'C', 'B', 'A')`

;`ordered_result`

-`TRUE`

(to**order the factor values**);`right`

-`FALSE`

(to**include the left boundary**of an interval, not the right).

- Output the contents of
`grades_f`

.

Everything was clear?

**numeric** data into groups, you can use the `cut()`

function in R, which assigns each number to a category based on specified intervals. For instance, if you have a **continuous variable** like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

Among the parameters listed, these are crucial for categorizing data:

`x`

is the**numeric vector**to be categorized;`breaks`

can be an integer specifying the**number of intervals**or a vector of cut points;`labels`

provide names for the categories;`right`

indicates if the intervals should be**closed on the right**;`ordered_result`

determines if the resulting factors should have an order.

`breaks`

to `3`

or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

`c(0, 160, 190, 250)`

for `breaks`

to divide the data into **three** groups: (0, 160], (160, 190], and (190, 250]. We also set `ordered_result`

to `TRUE`

to define a **logical order** among categories (e.g., short < medium < tall).

Task

- Given a vector of numerical grades, here's how to categorize them as factor levels:
- [0, 60) - F;
- [60, 75) - D;
- [75, 85) - C;
- [85, 95) - B;
- [95, 100) - A.

- Create a variable
`grades_f`

that stores the factor levels with the specified breaks and labels, considering the ordering, and use`right = FALSE`

to include the left boundary of the intervals;`breaks`

-`c(0, 60, 75, 85, 95, 100)`

;`labels`

-`c('F', 'D', 'C', 'B', 'A')`

;`ordered_result`

-`TRUE`

(to**order the factor values**);`right`

-`FALSE`

(to**include the left boundary**of an interval, not the right).

- Output the contents of
`grades_f`

.