Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте XGBoost | Framework Deep Dive
Advanced Tree-Based Models

bookXGBoost

XGBoost is a leading implementation of gradient boosted decision trees, known for its efficiency and scalability. It minimizes a loss function by using both the gradient (first derivative) and hessian (second derivative), enabling more informed tree splits and better optimization.

XGBoost features strong regularization: lambda (L2 regularization) and alpha (L1 regularization) control model complexity and help prevent overfitting by penalizing large leaf weights.

Its sparsity-aware split finding handles missing values and explicit zeros by learning the optimal path for missing data, making XGBoost robust and efficient with incomplete or sparse datasets.

123456789101112131415161718192021222324252627282930313233343536
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier # 1) Generate a small synthetic dataset X, y = make_classification( n_samples=300, n_features=10, n_informative=5, random_state=42 ) # 2) Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Create a simple XGBoost model model = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42, verbosity=0 ) # 4) Fit the model model.fit(X_train, y_train) # 5) Predict and evaluate preds = model.predict(X_test) acc = accuracy_score(y_test, preds) print("Test accuracy:", acc)
copy

In this example, we train an XGBoost classifier using the scikit-learn interface, which provides an intuitive .fit() and .predict() workflow. The key parameters used are: n_estimators=100, which sets how many boosting rounds (trees) the model will build; learning_rate=0.1, which controls how much each new tree contributes to correcting previous errors (smaller values make learning more stable but require more trees); and max_depth=3, which defines how deep each decision tree can grow, influencing model complexity and overfitting. The training process is performed with model.fit(X_train, y_train), where XGBoost iteratively builds trees that minimize predictive error, and predictions are obtained via model.predict(X_test). Finally, we compute accuracy with accuracy_score, which measures how often the model correctly predicts class labels. This small example demonstrates how XGBoost’s core boosting mechanism, combined with just a few essential hyperparameters, can produce a strong baseline model with minimal setup.

Завдання

Swipe to start coding

You are given a regression dataset. Your task is to:

  1. Load the dataset and split it into train/test sets.
  2. Initialize an XGBRegressor with the following parameters:
    • n_estimators=200.
    • learning_rate=0.05.
    • max_depth=4.
    • subsample=0.8.
    • random_state=42.
  3. Train the model.
  4. Predict on the test set.
  5. Compute Mean Squared Error (MSE) and store it in mse_value.
  6. Print dataset shapes, model parameters, and the final MSE.

Рішення

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 1
single

single

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Can you explain what the gradient and hessian are in the context of XGBoost?

How do the regularization parameters lambda and alpha affect the model?

What does sparsity-aware split finding mean in practice?

close

Awesome!

Completion rate improved to 11.11

bookXGBoost

Свайпніть щоб показати меню

XGBoost is a leading implementation of gradient boosted decision trees, known for its efficiency and scalability. It minimizes a loss function by using both the gradient (first derivative) and hessian (second derivative), enabling more informed tree splits and better optimization.

XGBoost features strong regularization: lambda (L2 regularization) and alpha (L1 regularization) control model complexity and help prevent overfitting by penalizing large leaf weights.

Its sparsity-aware split finding handles missing values and explicit zeros by learning the optimal path for missing data, making XGBoost robust and efficient with incomplete or sparse datasets.

123456789101112131415161718192021222324252627282930313233343536
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier # 1) Generate a small synthetic dataset X, y = make_classification( n_samples=300, n_features=10, n_informative=5, random_state=42 ) # 2) Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Create a simple XGBoost model model = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42, verbosity=0 ) # 4) Fit the model model.fit(X_train, y_train) # 5) Predict and evaluate preds = model.predict(X_test) acc = accuracy_score(y_test, preds) print("Test accuracy:", acc)
copy

In this example, we train an XGBoost classifier using the scikit-learn interface, which provides an intuitive .fit() and .predict() workflow. The key parameters used are: n_estimators=100, which sets how many boosting rounds (trees) the model will build; learning_rate=0.1, which controls how much each new tree contributes to correcting previous errors (smaller values make learning more stable but require more trees); and max_depth=3, which defines how deep each decision tree can grow, influencing model complexity and overfitting. The training process is performed with model.fit(X_train, y_train), where XGBoost iteratively builds trees that minimize predictive error, and predictions are obtained via model.predict(X_test). Finally, we compute accuracy with accuracy_score, which measures how often the model correctly predicts class labels. This small example demonstrates how XGBoost’s core boosting mechanism, combined with just a few essential hyperparameters, can produce a strong baseline model with minimal setup.

Завдання

Swipe to start coding

You are given a regression dataset. Your task is to:

  1. Load the dataset and split it into train/test sets.
  2. Initialize an XGBRegressor with the following parameters:
    • n_estimators=200.
    • learning_rate=0.05.
    • max_depth=4.
    • subsample=0.8.
    • random_state=42.
  3. Train the model.
  4. Predict on the test set.
  5. Compute Mean Squared Error (MSE) and store it in mse_value.
  6. Print dataset shapes, model parameters, and the final MSE.

Рішення

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 1
single

single

some-alt