Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Grouping Numeric Data | Factors
R Introduction: Part I
course content

Kursusindhold

R Introduction: Part I

R Introduction: Part I

1. Basic Syntax and Operations
2. Basic Data Types and Vectors
3. Factors

book
Grouping Numeric Data

To categorize numeric data into groups, you can use the cut() function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

r

Among the parameters listed, these are crucial for categorizing data:

  • x is the numeric vector to be categorized;
  • breaks can be an integer specifying the number of intervals or a vector of cut points;
  • labels provide names for the categories;
  • right indicates if the intervals should be closed on the right;
  • ordered_result determines if the resulting factors should have an order.

To create three categories, set breaks to 3 or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

1234567
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable
copy

For our example of categorizing height, we choose c(0, 160, 190, 250) for breaks to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result to TRUE to define a logical order among categories (e.g., short < medium < tall).

Opgave

Swipe to start coding

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:

    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;

    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Løsning

Switch to desktopSkift til skrivebord for at øve i den virkelige verdenFortsæt der, hvor du er, med en af nedenstående muligheder
Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 3. Kapitel 5
toggle bottom row

book
Grouping Numeric Data

To categorize numeric data into groups, you can use the cut() function in R, which assigns each number to a category based on specified intervals. For instance, if you have a continuous variable like height, you can categorize individuals as 'tall', 'medium', or 'short' based on height ranges.

Here's how you can use it:

r

Among the parameters listed, these are crucial for categorizing data:

  • x is the numeric vector to be categorized;
  • breaks can be an integer specifying the number of intervals or a vector of cut points;
  • labels provide names for the categories;
  • right indicates if the intervals should be closed on the right;
  • ordered_result determines if the resulting factors should have an order.

To create three categories, set breaks to 3 or provide a vector with four cut points to form three intervals, for instance (a,b], (b,c], (c,d].

1234567
# Vector of heights heights <- c(170, 165, 195, 172, 189, 156, 178, 198, 157, 182, 171, 184, 163, 176, 169, 153) # Convert into factor by cutting into intervals heights_f <- cut(heights, breaks = c(0, 160, 190, 250), labels = c('small', 'medium', 'tall'), ordered_result = T) heights_f # Output the factor variable
copy

For our example of categorizing height, we choose c(0, 160, 190, 250) for breaks to divide the data into three groups: (0, 160], (160, 190], and (190, 250]. We also set ordered_result to TRUE to define a logical order among categories (e.g., short < medium < tall).

Opgave

Swipe to start coding

  1. Given a vector of numerical grades, here's how to categorize them as factor levels:

    • [0, 60) - F;
    • [60, 75) - D;
    • [75, 85) - C;
    • [85, 95) - B;
    • [95, 100) - A.
  2. Create a variable grades_f that stores the factor levels with the specified breaks and labels, considering the ordering, and use right = FALSE to include the left boundary of the intervals;

    • breaks - c(0, 60, 75, 85, 95, 100);
    • labels - c('F', 'D', 'C', 'B', 'A');
    • ordered_result - TRUE (to order the factor values);
    • right - FALSE (to include the left boundary of an interval, not the right).
  3. Output the contents of grades_f.

Løsning

Switch to desktopSkift til skrivebord for at øve i den virkelige verdenFortsæt der, hvor du er, med en af nedenstående muligheder
Var alt klart?

Hvordan kan vi forbedre det?

Tak for dine kommentarer!

Sektion 3. Kapitel 5
Switch to desktopSkift til skrivebord for at øve i den virkelige verdenFortsæt der, hvor du er, med en af nedenstående muligheder
Vi beklager, at noget gik galt. Hvad skete der?
some-alt