Challenge: Implementing a Decision Tree
In this challenge, you will use the titanic dataset. It holds information about passengers on the Titanic, including their age, sex, family size, etc. And the task is to predict whether a person survived or not.
9
1
2
3
4
import pandas as pd
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/titanic.csv')
print(df.head())
1234import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/titanic.csv') print(df.head())
To implement the Decision Tree, you can use the DecisionTreeClassifier
from the sklearn
.
Your task is to build a Decision Tree and find the best max_depth
and min_samples_leaf
using grid search.
Compito
Swipe to start coding
- Import the
DecisionTreeClassifier
class fromsklearn.tree
. - Assign an instance of
DecisionTreeClassifier
to thedecision_tree
variable. - Create a dictionary for a
GridSearchCV
to run through[1, 2, 3, 4, 5, 6, 7]
values ofmax_depth
and[1, 2, 4, 6]
values ofmin_samples_leaf
. - Create a
GridSearchCV
object and train it.
Soluzione
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
# Read the data and assign the variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/titanic.csv')
X = df.drop(columns=['Survived'])
y = df['Survived']
decision_tree = DecisionTreeClassifier()
param_grid = {'max_depth': [1, 2, 3, 4, 5, 6, 7], 'min_samples_leaf': [1, 2, 4, 6]}
# Use `GridSearchCV` to find the best parameters
grid = GridSearchCV(decision_tree, param_grid, cv=10).fit(X, y)
# Print the best estimator and score
print(grid.best_estimator_)
print(grid.best_score_)
Tutto è chiaro?
Grazie per i tuoi commenti!
Sezione 3. Capitolo 4
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pandas as pd
from sklearn.tree import ___
from sklearn.model_selection import GridSearchCV
# Read the data and assign the variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/titanic.csv')
X = df.drop(columns=['Survived'])
y = df['Survived']
decision_tree = ___()
param_grid = {'max_depth': [1, 2, 3, 4, 5, 6, 7], '___': [1, 2, 4, 6]}
# Use `GridSearchCV` to find the best parameters
grid = GridSearchCV(decision_tree, param_grid, cv=10).___(X, y)
# Print the best estimator and score
print(grid.best_estimator_)
print(grid.best_score_)
Chieda ad AI
Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione