Glissez pour afficher le menu

Evaluating regression models is a crucial step in predictive modeling, as it helps you understand how well your models predict continuous outcomes. The most common regression evaluation metrics are Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R²). RMSE measures the average magnitude of prediction errors, penalizing larger errors more heavily. MAE calculates the average absolute difference between predicted and actual values, making it less sensitive to outliers than RMSE. R-squared represents the proportion of variance in the dependent variable explained by the model, with values closer to 1 indicating better model fit.


              12345678910111213141516171819202122232425
            
options(crayon.enabled = FALSE)
library(tidymodels)

# Assume you have a trained regression model and a split dataset
# Fit model (for demonstration, use linear regression)
lm_spec <- linear_reg() %>% set_engine("lm")
lm_fit <- lm_spec %>% fit(mpg ~ ., data = mtcars)

# Generate predictions on test data (here, using the same data for simplicity)
predictions <- predict(lm_fit, mtcars) %>%
  bind_cols(truth = mtcars$mpg)

# Calculate regression metrics
metrics <- metric_set(rmse, mae, rsq)
results <- metrics(predictions, truth = truth, estimate = .pred)
print(results)

# Visualize predictions vs. actuals
library(ggplot2)
ggplot(predictions, aes(x = truth, y = .pred)) +
  geom_point(color = "steelblue") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
  labs(title = "Predicted vs. Actual MPG",
       x = "Actual MPG",
       y = "Predicted MPG")

Once you have calculated these metrics, you need to interpret the results to assess model quality. Lower RMSE and MAE values indicate more accurate predictions, while a higher R-squared value suggests that your model explains more of the outcome variability. Comparing these metrics across different models or preprocessing strategies helps you select the best approach for your data. If you notice high error values or a low R-squared, it could signal issues such as underfitting, data quality problems, or the need for additional feature engineering. Visualizing predicted versus actual values can also reveal patterns like systematic under- or over-prediction, heteroscedasticity, or outliers, all of which provide valuable diagnostic insights for further model refinement.

Tout était clair ?

Merci pour vos commentaires !

Section 1. Chapitre 5

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Evaluating Regression Models


              12345678910111213141516171819202122232425
            
options(crayon.enabled = FALSE)
library(tidymodels)

# Assume you have a trained regression model and a split dataset
# Fit model (for demonstration, use linear regression)
lm_spec <- linear_reg() %>% set_engine("lm")
lm_fit <- lm_spec %>% fit(mpg ~ ., data = mtcars)

# Generate predictions on test data (here, using the same data for simplicity)
predictions <- predict(lm_fit, mtcars) %>%
  bind_cols(truth = mtcars$mpg)

# Calculate regression metrics
metrics <- metric_set(rmse, mae, rsq)
results <- metrics(predictions, truth = truth, estimate = .pred)
print(results)

# Visualize predictions vs. actuals
library(ggplot2)
ggplot(predictions, aes(x = truth, y = .pred)) +
  geom_point(color = "steelblue") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray") +
  labs(title = "Predicted vs. Actual MPG",
       x = "Actual MPG",
       y = "Predicted MPG")

Tout était clair ?

Merci pour vos commentaires !

Section 1. Chapitre 5