Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende XGBoost | Framework Deep Dive
Advanced Tree-Based Models

bookXGBoost

XGBoost is a leading implementation of gradient boosted decision trees, known for its efficiency and scalability. It minimizes a loss function by using both the gradient (first derivative) and hessian (second derivative), enabling more informed tree splits and better optimization.

XGBoost features strong regularization: lambda (L2 regularization) and alpha (L1 regularization) control model complexity and help prevent overfitting by penalizing large leaf weights.

Its sparsity-aware split finding handles missing values and explicit zeros by learning the optimal path for missing data, making XGBoost robust and efficient with incomplete or sparse datasets.

123456789101112131415161718192021222324252627282930313233343536
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier # 1) Generate a small synthetic dataset X, y = make_classification( n_samples=300, n_features=10, n_informative=5, random_state=42 ) # 2) Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Create a simple XGBoost model model = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42, verbosity=0 ) # 4) Fit the model model.fit(X_train, y_train) # 5) Predict and evaluate preds = model.predict(X_test) acc = accuracy_score(y_test, preds) print("Test accuracy:", acc)
copy

In this example, we train an XGBoost classifier using the scikit-learn interface, which provides an intuitive .fit() and .predict() workflow. The key parameters used are: n_estimators=100, which sets how many boosting rounds (trees) the model will build; learning_rate=0.1, which controls how much each new tree contributes to correcting previous errors (smaller values make learning more stable but require more trees); and max_depth=3, which defines how deep each decision tree can grow, influencing model complexity and overfitting. The training process is performed with model.fit(X_train, y_train), where XGBoost iteratively builds trees that minimize predictive error, and predictions are obtained via model.predict(X_test). Finally, we compute accuracy with accuracy_score, which measures how often the model correctly predicts class labels. This small example demonstrates how XGBoost’s core boosting mechanism, combined with just a few essential hyperparameters, can produce a strong baseline model with minimal setup.

Tarea

Swipe to start coding

You are given a regression dataset. Your task is to:

  1. Load the dataset and split it into train/test sets.
  2. Initialize an XGBRegressor with the following parameters:
    • n_estimators=200.
    • learning_rate=0.05.
    • max_depth=4.
    • subsample=0.8.
    • random_state=42.
  3. Train the model.
  4. Predict on the test set.
  5. Compute Mean Squared Error (MSE) and store it in mse_value.
  6. Print dataset shapes, model parameters, and the final MSE.

Solución

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 1
single

single

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Suggested prompts:

Can you explain what the gradient and hessian are in the context of XGBoost?

How do the regularization parameters lambda and alpha affect the model?

What does sparsity-aware split finding mean in practice?

close

Awesome!

Completion rate improved to 11.11

bookXGBoost

Desliza para mostrar el menú

XGBoost is a leading implementation of gradient boosted decision trees, known for its efficiency and scalability. It minimizes a loss function by using both the gradient (first derivative) and hessian (second derivative), enabling more informed tree splits and better optimization.

XGBoost features strong regularization: lambda (L2 regularization) and alpha (L1 regularization) control model complexity and help prevent overfitting by penalizing large leaf weights.

Its sparsity-aware split finding handles missing values and explicit zeros by learning the optimal path for missing data, making XGBoost robust and efficient with incomplete or sparse datasets.

123456789101112131415161718192021222324252627282930313233343536
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier # 1) Generate a small synthetic dataset X, y = make_classification( n_samples=300, n_features=10, n_informative=5, random_state=42 ) # 2) Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Create a simple XGBoost model model = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42, verbosity=0 ) # 4) Fit the model model.fit(X_train, y_train) # 5) Predict and evaluate preds = model.predict(X_test) acc = accuracy_score(y_test, preds) print("Test accuracy:", acc)
copy

In this example, we train an XGBoost classifier using the scikit-learn interface, which provides an intuitive .fit() and .predict() workflow. The key parameters used are: n_estimators=100, which sets how many boosting rounds (trees) the model will build; learning_rate=0.1, which controls how much each new tree contributes to correcting previous errors (smaller values make learning more stable but require more trees); and max_depth=3, which defines how deep each decision tree can grow, influencing model complexity and overfitting. The training process is performed with model.fit(X_train, y_train), where XGBoost iteratively builds trees that minimize predictive error, and predictions are obtained via model.predict(X_test). Finally, we compute accuracy with accuracy_score, which measures how often the model correctly predicts class labels. This small example demonstrates how XGBoost’s core boosting mechanism, combined with just a few essential hyperparameters, can produce a strong baseline model with minimal setup.

Tarea

Swipe to start coding

You are given a regression dataset. Your task is to:

  1. Load the dataset and split it into train/test sets.
  2. Initialize an XGBRegressor with the following parameters:
    • n_estimators=200.
    • learning_rate=0.05.
    • max_depth=4.
    • subsample=0.8.
    • random_state=42.
  3. Train the model.
  4. Predict on the test set.
  5. Compute Mean Squared Error (MSE) and store it in mse_value.
  6. Print dataset shapes, model parameters, and the final MSE.

Solución

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 2. Capítulo 1
single

single

some-alt