Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn XGBoost | Framework Deep Dive
Advanced Tree-Based Models

bookXGBoost

XGBoost is a leading implementation of gradient boosted decision trees, known for its efficiency and scalability. It minimizes a loss function by using both the gradient (first derivative) and hessian (second derivative), enabling more informed tree splits and better optimization.

XGBoost features strong regularization: lambda (L2 regularization) and alpha (L1 regularization) control model complexity and help prevent overfitting by penalizing large leaf weights.

Its sparsity-aware split finding handles missing values and explicit zeros by learning the optimal path for missing data, making XGBoost robust and efficient with incomplete or sparse datasets.

123456789101112131415161718192021222324252627282930313233343536
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier # 1) Generate a small synthetic dataset X, y = make_classification( n_samples=300, n_features=10, n_informative=5, random_state=42 ) # 2) Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Create a simple XGBoost model model = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42, verbosity=0 ) # 4) Fit the model model.fit(X_train, y_train) # 5) Predict and evaluate preds = model.predict(X_test) acc = accuracy_score(y_test, preds) print("Test accuracy:", acc)
copy

In this example, we train an XGBoost classifier using the scikit-learn interface, which provides an intuitive .fit() and .predict() workflow. The key parameters used are: n_estimators=100, which sets how many boosting rounds (trees) the model will build; learning_rate=0.1, which controls how much each new tree contributes to correcting previous errors (smaller values make learning more stable but require more trees); and max_depth=3, which defines how deep each decision tree can grow, influencing model complexity and overfitting. The training process is performed with model.fit(X_train, y_train), where XGBoost iteratively builds trees that minimize predictive error, and predictions are obtained via model.predict(X_test). Finally, we compute accuracy with accuracy_score, which measures how often the model correctly predicts class labels. This small example demonstrates how XGBoost’s core boosting mechanism, combined with just a few essential hyperparameters, can produce a strong baseline model with minimal setup.

Task

Swipe to start coding

You are given a regression dataset. Your task is to:

  1. Load the dataset and split it into train/test sets.
  2. Initialize an XGBRegressor with the following parameters:
    • n_estimators=200.
    • learning_rate=0.05.
    • max_depth=4.
    • subsample=0.8.
    • random_state=42.
  3. Train the model.
  4. Predict on the test set.
  5. Compute Mean Squared Error (MSE) and store it in mse_value.
  6. Print dataset shapes, model parameters, and the final MSE.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 1
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

Awesome!

Completion rate improved to 11.11

bookXGBoost

Swipe to show menu

XGBoost is a leading implementation of gradient boosted decision trees, known for its efficiency and scalability. It minimizes a loss function by using both the gradient (first derivative) and hessian (second derivative), enabling more informed tree splits and better optimization.

XGBoost features strong regularization: lambda (L2 regularization) and alpha (L1 regularization) control model complexity and help prevent overfitting by penalizing large leaf weights.

Its sparsity-aware split finding handles missing values and explicit zeros by learning the optimal path for missing data, making XGBoost robust and efficient with incomplete or sparse datasets.

123456789101112131415161718192021222324252627282930313233343536
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier # 1) Generate a small synthetic dataset X, y = make_classification( n_samples=300, n_features=10, n_informative=5, random_state=42 ) # 2) Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Create a simple XGBoost model model = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42, verbosity=0 ) # 4) Fit the model model.fit(X_train, y_train) # 5) Predict and evaluate preds = model.predict(X_test) acc = accuracy_score(y_test, preds) print("Test accuracy:", acc)
copy

In this example, we train an XGBoost classifier using the scikit-learn interface, which provides an intuitive .fit() and .predict() workflow. The key parameters used are: n_estimators=100, which sets how many boosting rounds (trees) the model will build; learning_rate=0.1, which controls how much each new tree contributes to correcting previous errors (smaller values make learning more stable but require more trees); and max_depth=3, which defines how deep each decision tree can grow, influencing model complexity and overfitting. The training process is performed with model.fit(X_train, y_train), where XGBoost iteratively builds trees that minimize predictive error, and predictions are obtained via model.predict(X_test). Finally, we compute accuracy with accuracy_score, which measures how often the model correctly predicts class labels. This small example demonstrates how XGBoost’s core boosting mechanism, combined with just a few essential hyperparameters, can produce a strong baseline model with minimal setup.

Task

Swipe to start coding

You are given a regression dataset. Your task is to:

  1. Load the dataset and split it into train/test sets.
  2. Initialize an XGBRegressor with the following parameters:
    • n_estimators=200.
    • learning_rate=0.05.
    • max_depth=4.
    • subsample=0.8.
    • random_state=42.
  3. Train the model.
  4. Predict on the test set.
  5. Compute Mean Squared Error (MSE) and store it in mse_value.
  6. Print dataset shapes, model parameters, and the final MSE.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 2. ChapterΒ 1
single

single

some-alt