Lära Regularization & Optimization | Modern Gradient Boosting Foundations

Building robust and efficient gradient boosted decision trees (GBDTs) depends on understanding several key regularization and optimization parameters:

Learning rate (shrinkage): This controls how much each new tree corrects the errors of previous trees. A lower learning rate means each tree makes smaller changes, which helps prevent overfitting. However, you usually need more trees to achieve strong performance when the learning rate is low.
Subsampling: This technique trains each tree on a random subset of the data. By introducing randomness, subsampling reduces overfitting and improves generalization. The subsample parameter sets the fraction of data used for each tree.
Tree depth: This limits how complex each tree can become. Shallow trees (with small maximum depth) are less likely to overfit. Deeper trees can capture more detailed patterns, but they are more likely to fit noise in the data.
Loss function: This defines the objective the model tries to optimize. For regression problems, common loss functions include mean squared error and mean absolute error. For classification, popular choices are log loss and hinge loss. The loss function directly shapes the optimization process and the model's final performance.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Load regression dataset (Boston removed from sklearn)
data = fetch_california_housing()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train two models with different learning rates
model_high_lr = XGBRegressor(
    n_estimators=50,
    learning_rate=0.3,
    max_depth=3,
    random_state=42,
    verbosity=0
)

model_low_lr = XGBRegressor(
    n_estimators=50,
    learning_rate=0.03,
    max_depth=3,
    random_state=42,
    verbosity=0
)

model_high_lr.fit(X_train, y_train)
model_low_lr.fit(X_train, y_train)

train_pred_high = model_high_lr.predict(X_train)
test_pred_high = model_high_lr.predict(X_test)

train_pred_low = model_low_lr.predict(X_train)
test_pred_low = model_low_lr.predict(X_test)

print("High learning rate:")
print("Train MSE:", mean_squared_error(y_train, train_pred_high))
print("Test MSE:", mean_squared_error(y_test, test_pred_high))

print("\nLow learning rate:")
print("Train MSE:", mean_squared_error(y_train, train_pred_low))
print("Test MSE:", mean_squared_error(y_test, test_pred_low))

Note

A lower learning rate generally requires more estimators (trees) to achieve the same training error, but it can improve generalization and reduce overfitting. Using a high learning rate with too many trees can cause the model to fit noise in the data, while a very low learning rate with too few trees may lead to underfitting.

Var allt tydligt?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 2

Fråga AI

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Awesome!

Completion rate improved to 11.11

Svep för att visa menyn

Building robust and efficient gradient boosted decision trees (GBDTs) depends on understanding several key regularization and optimization parameters:

Learning rate (shrinkage): This controls how much each new tree corrects the errors of previous trees. A lower learning rate means each tree makes smaller changes, which helps prevent overfitting. However, you usually need more trees to achieve strong performance when the learning rate is low.
Subsampling: This technique trains each tree on a random subset of the data. By introducing randomness, subsampling reduces overfitting and improves generalization. The subsample parameter sets the fraction of data used for each tree.
Tree depth: This limits how complex each tree can become. Shallow trees (with small maximum depth) are less likely to overfit. Deeper trees can capture more detailed patterns, but they are more likely to fit noise in the data.
Loss function: This defines the objective the model tries to optimize. For regression problems, common loss functions include mean squared error and mean absolute error. For classification, popular choices are log loss and hinge loss. The loss function directly shapes the optimization process and the model's final performance.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344
            
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Load regression dataset (Boston removed from sklearn)
data = fetch_california_housing()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Train two models with different learning rates
model_high_lr = XGBRegressor(
    n_estimators=50,
    learning_rate=0.3,
    max_depth=3,
    random_state=42,
    verbosity=0
)

model_low_lr = XGBRegressor(
    n_estimators=50,
    learning_rate=0.03,
    max_depth=3,
    random_state=42,
    verbosity=0
)

model_high_lr.fit(X_train, y_train)
model_low_lr.fit(X_train, y_train)

train_pred_high = model_high_lr.predict(X_train)
test_pred_high = model_high_lr.predict(X_test)

train_pred_low = model_low_lr.predict(X_train)
test_pred_low = model_low_lr.predict(X_test)

print("High learning rate:")
print("Train MSE:", mean_squared_error(y_train, train_pred_high))
print("Test MSE:", mean_squared_error(y_test, test_pred_high))

print("\nLow learning rate:")
print("Train MSE:", mean_squared_error(y_train, train_pred_low))
print("Test MSE:", mean_squared_error(y_test, test_pred_low))

Note

Var allt tydligt?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 2