Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Evaluating Model Performance | Model Evaluation and Machine Learning Workflows
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
R for Data Scientists

bookEvaluating Model Performance

When you build predictive models, you need a reliable way to determine how good your model is at making predictions. Quantifying model accuracy and error helps you compare different models, understand where your model might be failing, and decide whether your model is ready for deployment. Without clear metrics, you would have no objective basis for improvement or selection. Metrics like root mean squared error (RMSE) for regression and accuracy for classification are commonly used to summarize model performance in a single, interpretable number.

1234567891011121314151617181920212223
library(dplyr) # Example regression predictions reg_results <- tibble( truth = c(2.5, 0.0, 2.1, 1.6), prediction = c(3.0, -0.1, 2.0, 1.5) ) # Compute RMSE (base R) reg_rmse <- sqrt(mean((reg_results$truth - reg_results$prediction)^2)) # Example classification predictions class_results <- tibble( truth = factor(c("cat", "dog", "cat", "dog")), prediction = factor(c("cat", "cat", "cat", "dog")) ) # Compute accuracy (base R) class_acc <- mean(class_results$truth == class_results$prediction) # Print results print(reg_rmse) print(class_acc)
copy

The rmse() and accuracy() functions from the yardstick package require a data frame or tibble with at least two columns: one for the true values (truth) and one for the predicted values (prediction). For rmse(), both columns should be numeric. For accuracy(), both columns should be factors with matching levels. You specify which columns represent the true values and predictions using the truth = and estimate = arguments. These functions return a tibble summarizing the metric, which includes the metric name, the type of estimator, and the calculated value.

Note
Note

Always ensure the column names you pass to truth and estimate match exactly with your data frame. For classification, make sure your factor levels are consistent between truth and prediction columns, or you may get misleading results or errors. Choose metrics appropriate for your problem type — do not use accuracy for regression, or RMSE for classification.

question mark

Which statement best describes how to use RMSE and accuracy when evaluating model performance?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 4. Luku 2

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

bookEvaluating Model Performance

Pyyhkäise näyttääksesi valikon

When you build predictive models, you need a reliable way to determine how good your model is at making predictions. Quantifying model accuracy and error helps you compare different models, understand where your model might be failing, and decide whether your model is ready for deployment. Without clear metrics, you would have no objective basis for improvement or selection. Metrics like root mean squared error (RMSE) for regression and accuracy for classification are commonly used to summarize model performance in a single, interpretable number.

1234567891011121314151617181920212223
library(dplyr) # Example regression predictions reg_results <- tibble( truth = c(2.5, 0.0, 2.1, 1.6), prediction = c(3.0, -0.1, 2.0, 1.5) ) # Compute RMSE (base R) reg_rmse <- sqrt(mean((reg_results$truth - reg_results$prediction)^2)) # Example classification predictions class_results <- tibble( truth = factor(c("cat", "dog", "cat", "dog")), prediction = factor(c("cat", "cat", "cat", "dog")) ) # Compute accuracy (base R) class_acc <- mean(class_results$truth == class_results$prediction) # Print results print(reg_rmse) print(class_acc)
copy

The rmse() and accuracy() functions from the yardstick package require a data frame or tibble with at least two columns: one for the true values (truth) and one for the predicted values (prediction). For rmse(), both columns should be numeric. For accuracy(), both columns should be factors with matching levels. You specify which columns represent the true values and predictions using the truth = and estimate = arguments. These functions return a tibble summarizing the metric, which includes the metric name, the type of estimator, and the calculated value.

Note
Note

Always ensure the column names you pass to truth and estimate match exactly with your data frame. For classification, make sure your factor levels are consistent between truth and prediction columns, or you may get misleading results or errors. Choose metrics appropriate for your problem type — do not use accuracy for regression, or RMSE for classification.

question mark

Which statement best describes how to use RMSE and accuracy when evaluating model performance?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 4. Luku 2
some-alt