Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Challenge: Comparing Models | Comparing Models
Classification with Python

book
Challenge: Comparing Models

Now we will compare the models we learned on one dataset. This is a breast cancer dataset. The target is the 'diagnosis' column (1 – malignant, 0 – benign).

We will apply GridSearchCV to each model to find the best parameters. Also, in this task, we would use the recall metric for scoring since we do not want to have False Negatives. GridSearchCV can choose the parameters based on the recall metric if you set scoring='recall'.

Завдання

Swipe to start coding

The task is to build all the models we learned and to print the best parameters along with the best recall score of each model. You will need to fill in the parameter names in the param_grid dictionaries.

  1. For the k-NN model find the best n_neighbors value out of [3, 5, 7, 12].
  2. For the Logistic Regression run through [0.1, 1, 10] values of C.
  3. For a Decision Tree, we want to configure two parameters, max_depth and min_samples_leaf. Run through values [2, 4, 6, 10] for max_depth and [1, 2, 4, 7] for min_samples_leaf.
  4. For a Random Forest, find the best max_depth(maximum depth of each Tree) value out of [2, 4, 6] and the best number of trees(n_estimators). Try values [20, 50, 100] for the number of trees.

Рішення

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/breastcancer.csv')
y = df['diagnosis']
X = df.drop('diagnosis', axis=1)
# Scale the `X` since it is needed for most models
X = StandardScaler().fit_transform(X)

model_grids = [] # We will keep trained models here to print the result easier
# k-NN
knn = KNeighborsClassifier()
knn_grid = GridSearchCV(knn,
{'n_neighbors': [3, 5, 7, 12]},
scoring='recall').fit(X, y)
model_grids.append(knn_grid)
# Logistic Regression
lr = LogisticRegression()
lr_grid = GridSearchCV(lr,
{'C': [0.1, 1, 10]},
scoring='recall').fit(X, y)
model_grids.append(lr_grid)
# Decision Tree
dt = DecisionTreeClassifier()
dt_grid = GridSearchCV(dt,
{'max_depth': [2, 4, 6, 10], 'min_samples_leaf': [1, 2, 4, 7]},
scoring='recall').fit(X, y)
model_grids.append(dt_grid)
# Random Forest
rf = RandomForestClassifier()
rf_grid = GridSearchCV(rf,

Note

The code takes some time to run(less than a minute).

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 5. Розділ 3
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/breastcancer.csv')
y = df['diagnosis']
X = df.drop('diagnosis', axis=1)
# Scale the `X` since it is needed for most models
X = StandardScaler().fit_transform(X)

model_grids = [] # We will keep trained models here to print the result easier
# k-NN
knn = KNeighborsClassifier()
knn_grid = GridSearchCV(knn,
{'___': [3, 5, 7, 12]},
scoring='recall').fit(X, y)
model_grids.append(knn_grid)
# Logistic Regression
lr = LogisticRegression()
lr_grid = GridSearchCV(lr,
{'___': [0.1, 1, 10]},
scoring='recall').fit(X, y)
model_grids.append(lr_grid)
# Decision Tree
dt = DecisionTreeClassifier()
dt_grid = GridSearchCV(dt,
{'___': [2, 4, 6, 10], '___': [1, 2, 4, 7]},
scoring='recall').fit(X, y)
model_grids.append(dt_grid)
# Random Forest
rf = RandomForestClassifier()
rf_grid = GridSearchCV(rf,
{'___': [2, 4, 6], '___': [20, 50, 100]},
scoring='recall').fit(X, y)
model_grids.append(rf_grid)

for model in model_grids:
print(model.best_estimator_, '– recall:', model.best_score_)
toggle bottom row
some-alt