Challenge: Implementing a Random Forest
In this chapter, you will build a Random Forest using the same titanic dataset.
Also, you will calculate the cross-validation accuracy using the cross_val_score()
function
In the end, you will print the feature importances.
The feature_importances_
attribute only holds an array with importances without specifying the name of a feature.
To print the pairs ('name', importance), you can use the following syntax:
python912for f in zip(X.columns, model.feature_importances_):print(f)
Task
Swipe to start coding
- Import the
RandomForestClassifier
class. - Create an instance of a
RandomForestClassifier
class with default parameters and train it. - Print the cross-validation score with the
cv=10
of arandom_forest
you just built. - Print each feature's importance along with its name.
Solution
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
# Read the data and assign the variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/titanic.csv')
X = df.drop('Survived', axis=1)
y = df['Survived']
# Build and train a Random Forest
random_forest = RandomForestClassifier().fit(X, y)
# Print the cross-validation accuracy
print(cross_val_score(random_forest, X, y, cv=10).mean())
for feature in zip(X.columns, random_forest.feature_importances_):
print(feature)
Everything was clear?
Thanks for your feedback!
Section 4. Chapter 3
99
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
from sklearn.ensemble import ___
from sklearn.model_selection import cross_val_score
# Read the data and assign the variables
df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/titanic.csv')
X = df.drop('Survived', axis=1)
y = df['Survived']
# Build and train a Random Forest
random_forest = ___().___(X, y)
# Print the cross-validation accuracy
print(cross_val_score(___, ___, ___, cv=10).mean())
for feature in zip(X.columns, random_forest.___):
print(feature)
Ask AI
Ask anything or try one of the suggested questions to begin our chat