Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Regularization & Optimization | Modern Gradient Boosting Foundations
Advanced Tree-Based Models

bookRegularization & Optimization

Building robust and efficient gradient boosted decision trees (GBDTs) depends on understanding several key regularization and optimization parameters:

  • Learning rate (shrinkage): This controls how much each new tree corrects the errors of previous trees. A lower learning rate means each tree makes smaller changes, which helps prevent overfitting. However, you usually need more trees to achieve strong performance when the learning rate is low.

  • Subsampling: This technique trains each tree on a random subset of the data. By introducing randomness, subsampling reduces overfitting and improves generalization. The subsample parameter sets the fraction of data used for each tree.

  • Tree depth: This limits how complex each tree can become. Shallow trees (with small maximum depth) are less likely to overfit. Deeper trees can capture more detailed patterns, but they are more likely to fit noise in the data.

  • Loss function: This defines the objective the model tries to optimize. For regression problems, common loss functions include mean squared error and mean absolute error. For classification, popular choices are log loss and hinge loss. The loss function directly shapes the optimization process and the model's final performance.

1234567891011121314151617181920212223242526272829303132333435363738394041424344
from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from xgboost import XGBRegressor from sklearn.metrics import mean_squared_error # Load regression dataset (Boston removed from sklearn) data = fetch_california_housing() X, y = data.data, data.target X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Train two models with different learning rates model_high_lr = XGBRegressor( n_estimators=50, learning_rate=0.3, max_depth=3, random_state=42, verbosity=0 ) model_low_lr = XGBRegressor( n_estimators=50, learning_rate=0.03, max_depth=3, random_state=42, verbosity=0 ) model_high_lr.fit(X_train, y_train) model_low_lr.fit(X_train, y_train) train_pred_high = model_high_lr.predict(X_train) test_pred_high = model_high_lr.predict(X_test) train_pred_low = model_low_lr.predict(X_train) test_pred_low = model_low_lr.predict(X_test) print("High learning rate:") print("Train MSE:", mean_squared_error(y_train, train_pred_high)) print("Test MSE:", mean_squared_error(y_test, test_pred_high)) print("\nLow learning rate:") print("Train MSE:", mean_squared_error(y_train, train_pred_low)) print("Test MSE:", mean_squared_error(y_test, test_pred_low))
copy
Note
Note

A lower learning rate generally requires more estimators (trees) to achieve the same training error, but it can improve generalization and reduce overfitting. Using a high learning rate with too many trees can cause the model to fit noise in the data, while a very low learning rate with too few trees may lead to underfitting.

question mark

Which of the following statements about regularization and optimization parameters in tree-based ensembles is correct?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 2

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Awesome!

Completion rate improved to 11.11

bookRegularization & Optimization

Svep för att visa menyn

Building robust and efficient gradient boosted decision trees (GBDTs) depends on understanding several key regularization and optimization parameters:

  • Learning rate (shrinkage): This controls how much each new tree corrects the errors of previous trees. A lower learning rate means each tree makes smaller changes, which helps prevent overfitting. However, you usually need more trees to achieve strong performance when the learning rate is low.

  • Subsampling: This technique trains each tree on a random subset of the data. By introducing randomness, subsampling reduces overfitting and improves generalization. The subsample parameter sets the fraction of data used for each tree.

  • Tree depth: This limits how complex each tree can become. Shallow trees (with small maximum depth) are less likely to overfit. Deeper trees can capture more detailed patterns, but they are more likely to fit noise in the data.

  • Loss function: This defines the objective the model tries to optimize. For regression problems, common loss functions include mean squared error and mean absolute error. For classification, popular choices are log loss and hinge loss. The loss function directly shapes the optimization process and the model's final performance.

1234567891011121314151617181920212223242526272829303132333435363738394041424344
from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from xgboost import XGBRegressor from sklearn.metrics import mean_squared_error # Load regression dataset (Boston removed from sklearn) data = fetch_california_housing() X, y = data.data, data.target X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Train two models with different learning rates model_high_lr = XGBRegressor( n_estimators=50, learning_rate=0.3, max_depth=3, random_state=42, verbosity=0 ) model_low_lr = XGBRegressor( n_estimators=50, learning_rate=0.03, max_depth=3, random_state=42, verbosity=0 ) model_high_lr.fit(X_train, y_train) model_low_lr.fit(X_train, y_train) train_pred_high = model_high_lr.predict(X_train) test_pred_high = model_high_lr.predict(X_test) train_pred_low = model_low_lr.predict(X_train) test_pred_low = model_low_lr.predict(X_test) print("High learning rate:") print("Train MSE:", mean_squared_error(y_train, train_pred_high)) print("Test MSE:", mean_squared_error(y_test, test_pred_high)) print("\nLow learning rate:") print("Train MSE:", mean_squared_error(y_train, train_pred_low)) print("Test MSE:", mean_squared_error(y_test, test_pred_low))
copy
Note
Note

A lower learning rate generally requires more estimators (trees) to achieve the same training error, but it can improve generalization and reduce overfitting. Using a high learning rate with too many trees can cause the model to fit noise in the data, while a very low learning rate with too few trees may lead to underfitting.

question mark

Which of the following statements about regularization and optimization parameters in tree-based ensembles is correct?

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 2
some-alt