Lernen Regularization & Optimization | Modern Gradient Boosting Foundations

Building robust and efficient gradient boosted decision trees (GBDTs) depends on understanding several key regularization and optimization parameters:

Learning rate (shrinkage): This controls how much each new tree corrects the errors of previous trees. A lower learning rate means each tree makes smaller changes, which helps prevent overfitting. However, you usually need more trees to achieve strong performance when the learning rate is low.
Subsampling: This technique trains each tree on a random subset of the data. By introducing randomness, subsampling reduces overfitting and improves generalization. The subsample parameter sets the fraction of data used for each tree.
Tree depth: This limits how complex each tree can become. Shallow trees (with small maximum depth) are less likely to overfit. Deeper trees can capture more detailed patterns, but they are more likely to fit noise in the data.
Loss function: This defines the objective the model tries to optimize. For regression problems, common loss functions include mean squared error and mean absolute error. For classification, popular choices are log loss and hinge loss. The loss function directly shapes the optimization process and the model's final performance.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Load regression dataset (Boston removed from sklearn)
data = fetch_california_housing()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train two models with different learning rates
model_high_lr = XGBRegressor(
    n_estimators=50,
    learning_rate=0.3,
    max_depth=3,
    random_state=42,
    verbosity=0
)

model_low_lr = XGBRegressor(
    n_estimators=50,
    learning_rate=0.03,
    max_depth=3,
    random_state=42,
    verbosity=0
)

model_high_lr.fit(X_train, y_train)
model_low_lr.fit(X_train, y_train)

train_pred_high = model_high_lr.predict(X_train)
test_pred_high = model_high_lr.predict(X_test)

train_pred_low = model_low_lr.predict(X_train)
test_pred_low = model_low_lr.predict(X_test)

print("High learning rate:")
print("Train MSE:", mean_squared_error(y_train, train_pred_high))
print("Test MSE:", mean_squared_error(y_test, test_pred_high))

print("\nLow learning rate:")
print("Train MSE:", mean_squared_error(y_train, train_pred_low))
print("Test MSE:", mean_squared_error(y_test, test_pred_low))

Note

A lower learning rate generally requires more estimators (trees) to achieve the same training error, but it can improve generalization and reduce overfitting. Using a high learning rate with too many trees can cause the model to fit noise in the data, while a very low learning rate with too few trees may lead to underfitting.

War alles klar?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 2

Fragen Sie AI

Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen

Suggested prompts:

Can you explain why a lower learning rate requires more trees?

How does subsampling specifically help prevent overfitting?

What are some guidelines for choosing the right tree depth?

Awesome!

Completion rate improved to 11.11

Swipe um das Menü anzuzeigen

Building robust and efficient gradient boosted decision trees (GBDTs) depends on understanding several key regularization and optimization parameters:

Learning rate (shrinkage): This controls how much each new tree corrects the errors of previous trees. A lower learning rate means each tree makes smaller changes, which helps prevent overfitting. However, you usually need more trees to achieve strong performance when the learning rate is low.
Subsampling: This technique trains each tree on a random subset of the data. By introducing randomness, subsampling reduces overfitting and improves generalization. The subsample parameter sets the fraction of data used for each tree.
Tree depth: This limits how complex each tree can become. Shallow trees (with small maximum depth) are less likely to overfit. Deeper trees can capture more detailed patterns, but they are more likely to fit noise in the data.
Loss function: This defines the objective the model tries to optimize. For regression problems, common loss functions include mean squared error and mean absolute error. For classification, popular choices are log loss and hinge loss. The loss function directly shapes the optimization process and the model's final performance.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Load regression dataset (Boston removed from sklearn)
data = fetch_california_housing()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train two models with different learning rates
model_high_lr = XGBRegressor(
    n_estimators=50,
    learning_rate=0.3,
    max_depth=3,
    random_state=42,
    verbosity=0
)

model_low_lr = XGBRegressor(
    n_estimators=50,
    learning_rate=0.03,
    max_depth=3,
    random_state=42,
    verbosity=0
)

model_high_lr.fit(X_train, y_train)
model_low_lr.fit(X_train, y_train)

train_pred_high = model_high_lr.predict(X_train)
test_pred_high = model_high_lr.predict(X_test)

train_pred_low = model_low_lr.predict(X_train)
test_pred_low = model_low_lr.predict(X_test)

print("High learning rate:")
print("Train MSE:", mean_squared_error(y_train, train_pred_high))
print("Test MSE:", mean_squared_error(y_test, test_pred_high))

print("\nLow learning rate:")
print("Train MSE:", mean_squared_error(y_train, train_pred_low))
print("Test MSE:", mean_squared_error(y_test, test_pred_low))

Note

War alles klar?

Danke für Ihr Feedback!

Abschnitt 1. Kapitel 2