Challenge: Solving Task Using XGBoost
Opgave
Swipe to start coding
The "Credit Scoring" dataset is commonly used for credit risk analysis and binary classification tasks. It contains information about customers and their credit applications, with the goal of predicting whether a customer's credit application will result in a good or bad credit outcome.
Your task is to solve classification task on "Credit Scoring" dataset:
- Create
Dmatrix
objects using training and test data. Specifyenable_categorical
argument to use categorical features. - Train the XGBoost model using the training
DMatrix
object. - Set the split threshold to
0.5
for correct class detection.
Note
'objective': 'binary:logistic'
parameter means that we will use logistic loss (also known as binary cross-entropy loss) as an objective function when training the XGBoost model.
Løsning
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import numpy as np
import xgboost as xgb
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
# Load the Credit Scoring dataset
data = fetch_openml(name="credit-g", version=1, parser='auto')
X = data.data
y = data.target
# Convert target to binary (1: Good, 0: Bad)
y = (y == 'good').astype(int)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create DMatrix objects for XGBoost with categorical features enabled
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)
# Set hyperparameters
params = {
'objective': 'binary:logistic',
}
# Train the XGBoost classifier
model = xgb.train(params, dtrain)
# Make predictions
y_pred = model.predict(dtest)
y_pred_binary = (y_pred > 0.5).astype(int)
# Calculate F1-score
f1 = f1_score(y_test, y_pred_binary)
print(f'F1-score: {f1:.4f}')
Var alt klart?
Tak for dine kommentarer!
Sektion 3. Kapitel 6
single
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import numpy as np
import xgboost as xgb
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score
# Load the Credit Scoring dataset
data = fetch_openml(name="credit-g", version=1, parser='auto')
X = data.data
y = data.target
# Convert target to binary (1: Good, 0: Bad)
y = (y == 'good').astype(int)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create DMatrix objects for XGBoost with categorical features enabled
dtrain = xgb.___(___, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=___, enable_categorical=___)
# Set hyperparameters
params = {
'objective': 'binary:logistic',
}
# Train the XGBoost classifier
model = xgb.train(params, ___)
# Make predictions
y_pred = model.predict(dtest)
y_pred_binary = (y_pred > ___).astype(int)
# Calculate F1-score
f1 = f1_score(y_test, y_pred_binary)
print(f'F1-score: {f1:.4f}')
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat