Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele XGBoost | Framework Deep Dive
Advanced Tree-Based Models

bookXGBoost

XGBoost is a leading implementation of gradient boosted decision trees, known for its efficiency and scalability. It minimizes a loss function by using both the gradient (first derivative) and hessian (second derivative), enabling more informed tree splits and better optimization.

XGBoost features strong regularization: lambda (L2 regularization) and alpha (L1 regularization) control model complexity and help prevent overfitting by penalizing large leaf weights.

Its sparsity-aware split finding handles missing values and explicit zeros by learning the optimal path for missing data, making XGBoost robust and efficient with incomplete or sparse datasets.

123456789101112131415161718192021222324252627282930313233343536
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier # 1) Generate a small synthetic dataset X, y = make_classification( n_samples=300, n_features=10, n_informative=5, random_state=42 ) # 2) Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Create a simple XGBoost model model = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42, verbosity=0 ) # 4) Fit the model model.fit(X_train, y_train) # 5) Predict and evaluate preds = model.predict(X_test) acc = accuracy_score(y_test, preds) print("Test accuracy:", acc)
copy

In this example, we train an XGBoost classifier using the scikit-learn interface, which provides an intuitive .fit() and .predict() workflow. The key parameters used are: n_estimators=100, which sets how many boosting rounds (trees) the model will build; learning_rate=0.1, which controls how much each new tree contributes to correcting previous errors (smaller values make learning more stable but require more trees); and max_depth=3, which defines how deep each decision tree can grow, influencing model complexity and overfitting. The training process is performed with model.fit(X_train, y_train), where XGBoost iteratively builds trees that minimize predictive error, and predictions are obtained via model.predict(X_test). Finally, we compute accuracy with accuracy_score, which measures how often the model correctly predicts class labels. This small example demonstrates how XGBoost’s core boosting mechanism, combined with just a few essential hyperparameters, can produce a strong baseline model with minimal setup.

Tehtävä

Swipe to start coding

You are given a regression dataset. Your task is to:

  1. Load the dataset and split it into train/test sets.
  2. Initialize an XGBRegressor with the following parameters:
    • n_estimators=200.
    • learning_rate=0.05.
    • max_depth=4.
    • subsample=0.8.
    • random_state=42.
  3. Train the model.
  4. Predict on the test set.
  5. Compute Mean Squared Error (MSE) and store it in mse_value.
  6. Print dataset shapes, model parameters, and the final MSE.

Ratkaisu

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 1
single

single

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain what the gradient and hessian are in the context of XGBoost?

How do the regularization parameters lambda and alpha affect the model?

What does sparsity-aware split finding mean in practice?

close

Awesome!

Completion rate improved to 11.11

bookXGBoost

Pyyhkäise näyttääksesi valikon

XGBoost is a leading implementation of gradient boosted decision trees, known for its efficiency and scalability. It minimizes a loss function by using both the gradient (first derivative) and hessian (second derivative), enabling more informed tree splits and better optimization.

XGBoost features strong regularization: lambda (L2 regularization) and alpha (L1 regularization) control model complexity and help prevent overfitting by penalizing large leaf weights.

Its sparsity-aware split finding handles missing values and explicit zeros by learning the optimal path for missing data, making XGBoost robust and efficient with incomplete or sparse datasets.

123456789101112131415161718192021222324252627282930313233343536
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from xgboost import XGBClassifier # 1) Generate a small synthetic dataset X, y = make_classification( n_samples=300, n_features=10, n_informative=5, random_state=42 ) # 2) Train/test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 3) Create a simple XGBoost model model = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42, verbosity=0 ) # 4) Fit the model model.fit(X_train, y_train) # 5) Predict and evaluate preds = model.predict(X_test) acc = accuracy_score(y_test, preds) print("Test accuracy:", acc)
copy

In this example, we train an XGBoost classifier using the scikit-learn interface, which provides an intuitive .fit() and .predict() workflow. The key parameters used are: n_estimators=100, which sets how many boosting rounds (trees) the model will build; learning_rate=0.1, which controls how much each new tree contributes to correcting previous errors (smaller values make learning more stable but require more trees); and max_depth=3, which defines how deep each decision tree can grow, influencing model complexity and overfitting. The training process is performed with model.fit(X_train, y_train), where XGBoost iteratively builds trees that minimize predictive error, and predictions are obtained via model.predict(X_test). Finally, we compute accuracy with accuracy_score, which measures how often the model correctly predicts class labels. This small example demonstrates how XGBoost’s core boosting mechanism, combined with just a few essential hyperparameters, can produce a strong baseline model with minimal setup.

Tehtävä

Swipe to start coding

You are given a regression dataset. Your task is to:

  1. Load the dataset and split it into train/test sets.
  2. Initialize an XGBRegressor with the following parameters:
    • n_estimators=200.
    • learning_rate=0.05.
    • max_depth=4.
    • subsample=0.8.
    • random_state=42.
  3. Train the model.
  4. Predict on the test set.
  5. Compute Mean Squared Error (MSE) and store it in mse_value.
  6. Print dataset shapes, model parameters, and the final MSE.

Ratkaisu

Switch to desktopVaihda työpöytään todellista harjoitusta vartenJatka siitä, missä olet käyttämällä jotakin alla olevista vaihtoehdoista
Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 2. Luku 1
single

single

some-alt