Regularization & Optimization
Building robust and efficient gradient boosted decision trees (GBDTs) depends on understanding several key regularization and optimization parameters:
-
Learning rate (shrinkage): This controls how much each new tree corrects the errors of previous trees. A lower learning rate means each tree makes smaller changes, which helps prevent overfitting. However, you usually need more trees to achieve strong performance when the learning rate is low.
-
Subsampling: This technique trains each tree on a random subset of the data. By introducing randomness, subsampling reduces overfitting and improves generalization. The
subsampleparameter sets the fraction of data used for each tree. -
Tree depth: This limits how complex each tree can become. Shallow trees (with small maximum depth) are less likely to overfit. Deeper trees can capture more detailed patterns, but they are more likely to fit noise in the data.
-
Loss function: This defines the objective the model tries to optimize. For regression problems, common loss functions include mean squared error and mean absolute error. For classification, popular choices are log loss and hinge loss. The loss function directly shapes the optimization process and the model's final performance.
1234567891011121314151617181920212223242526272829303132333435363738394041424344from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from xgboost import XGBRegressor from sklearn.metrics import mean_squared_error # Load regression dataset (Boston removed from sklearn) data = fetch_california_housing() X, y = data.data, data.target X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Train two models with different learning rates model_high_lr = XGBRegressor( n_estimators=50, learning_rate=0.3, max_depth=3, random_state=42, verbosity=0 ) model_low_lr = XGBRegressor( n_estimators=50, learning_rate=0.03, max_depth=3, random_state=42, verbosity=0 ) model_high_lr.fit(X_train, y_train) model_low_lr.fit(X_train, y_train) train_pred_high = model_high_lr.predict(X_train) test_pred_high = model_high_lr.predict(X_test) train_pred_low = model_low_lr.predict(X_train) test_pred_low = model_low_lr.predict(X_test) print("High learning rate:") print("Train MSE:", mean_squared_error(y_train, train_pred_high)) print("Test MSE:", mean_squared_error(y_test, test_pred_high)) print("\nLow learning rate:") print("Train MSE:", mean_squared_error(y_train, train_pred_low)) print("Test MSE:", mean_squared_error(y_test, test_pred_low))
A lower learning rate generally requires more estimators (trees) to achieve the same training error, but it can improve generalization and reduce overfitting. Using a high learning rate with too many trees can cause the model to fit noise in the data, while a very low learning rate with too few trees may lead to underfitting.
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen
Can you explain why a lower learning rate requires more trees?
How does subsampling specifically help prevent overfitting?
What are some guidelines for choosing the right tree depth?
Awesome!
Completion rate improved to 11.11
Regularization & Optimization
Swipe um das Menü anzuzeigen
Building robust and efficient gradient boosted decision trees (GBDTs) depends on understanding several key regularization and optimization parameters:
-
Learning rate (shrinkage): This controls how much each new tree corrects the errors of previous trees. A lower learning rate means each tree makes smaller changes, which helps prevent overfitting. However, you usually need more trees to achieve strong performance when the learning rate is low.
-
Subsampling: This technique trains each tree on a random subset of the data. By introducing randomness, subsampling reduces overfitting and improves generalization. The
subsampleparameter sets the fraction of data used for each tree. -
Tree depth: This limits how complex each tree can become. Shallow trees (with small maximum depth) are less likely to overfit. Deeper trees can capture more detailed patterns, but they are more likely to fit noise in the data.
-
Loss function: This defines the objective the model tries to optimize. For regression problems, common loss functions include mean squared error and mean absolute error. For classification, popular choices are log loss and hinge loss. The loss function directly shapes the optimization process and the model's final performance.
1234567891011121314151617181920212223242526272829303132333435363738394041424344from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from xgboost import XGBRegressor from sklearn.metrics import mean_squared_error # Load regression dataset (Boston removed from sklearn) data = fetch_california_housing() X, y = data.data, data.target X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # Train two models with different learning rates model_high_lr = XGBRegressor( n_estimators=50, learning_rate=0.3, max_depth=3, random_state=42, verbosity=0 ) model_low_lr = XGBRegressor( n_estimators=50, learning_rate=0.03, max_depth=3, random_state=42, verbosity=0 ) model_high_lr.fit(X_train, y_train) model_low_lr.fit(X_train, y_train) train_pred_high = model_high_lr.predict(X_train) test_pred_high = model_high_lr.predict(X_test) train_pred_low = model_low_lr.predict(X_train) test_pred_low = model_low_lr.predict(X_test) print("High learning rate:") print("Train MSE:", mean_squared_error(y_train, train_pred_high)) print("Test MSE:", mean_squared_error(y_test, test_pred_high)) print("\nLow learning rate:") print("Train MSE:", mean_squared_error(y_train, train_pred_low)) print("Test MSE:", mean_squared_error(y_test, test_pred_low))
A lower learning rate generally requires more estimators (trees) to achieve the same training error, but it can improve generalization and reduce overfitting. Using a high learning rate with too many trees can cause the model to fit noise in the data, while a very low learning rate with too few trees may lead to underfitting.
Danke für Ihr Feedback!