Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Challenge: Solving Task Using XGBoost | Commonly Used Boosting Models
Ensemble Learning

book
Challenge: Solving Task Using XGBoost

Tarefa

Swipe to start coding

The "Credit Scoring" dataset is commonly used for credit risk analysis and binary classification tasks. It contains information about customers and their credit applications, with the goal of predicting whether a customer's credit application will result in a good or bad credit outcome.

Your task is to solve classification task on "Credit Scoring" dataset:

  1. Create Dmatrix objects using training and test data. Specify enable_categorical argument to use categorical features.
  2. Train the XGBoost model using the training DMatrix object.
  3. Set the split threshold to 0.5 for correct class detection.

Note

'objective': 'binary:logistic' parameter means that we will use logistic loss (also known as binary cross-entropy loss) as an objective function when training the XGBoost model.

Solução

import numpy as np
import xgboost as xgb
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

# Load the Credit Scoring dataset
data = fetch_openml(name="credit-g", version=1, parser='auto')
X = data.data
y = data.target

# Convert target to binary (1: Good, 0: Bad)
y = (y == 'good').astype(int)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create DMatrix objects for XGBoost with categorical features enabled
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

# Set hyperparameters
params = {
'objective': 'binary:logistic',
}

# Train the XGBoost classifier
model = xgb.train(params, dtrain)

# Make predictions
y_pred = model.predict(dtest)
y_pred_binary = (y_pred > 0.5).astype(int)

# Calculate F1-score
f1 = f1_score(y_test, y_pred_binary)
print(f'F1-score: {f1:.4f}')

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 6
import numpy as np
import xgboost as xgb
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

# Load the Credit Scoring dataset
data = fetch_openml(name="credit-g", version=1, parser='auto')
X = data.data
y = data.target

# Convert target to binary (1: Good, 0: Bad)
y = (y == 'good').astype(int)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create DMatrix objects for XGBoost with categorical features enabled
dtrain = xgb.___(___, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=___, enable_categorical=___)

# Set hyperparameters
params = {
'objective': 'binary:logistic',
}

# Train the XGBoost classifier
model = xgb.train(params, ___)

# Make predictions
y_pred = model.predict(dtest)
y_pred_binary = (y_pred > ___).astype(int)

# Calculate F1-score
f1 = f1_score(y_test, y_pred_binary)
print(f'F1-score: {f1:.4f}')
toggle bottom row
some-alt