Challenge: Implementing a Random Forest
In sklearn
, the classification version of Random Forest is implemented using the RandomForestClassifier
:
You will also calculate the cross-validation accuracy using the cross_val_score()
function:
In the end, you'll print the importance of each feature. The feature_importances_
attribute returns an array of importance scores - these scores represent how much each feature contributed to reducing Gini impurity across all the decision nodes where that feature was used. In other words, the more a feature helps split the data in a useful way, the higher its importance.
However, the attribute only gives the scores without feature names. To display both, you can pair them using Python's zip()
function:
for feature, importance in zip(X.columns, model.feature_importances_):
print(feature, importance)
This prints each feature name along with its importance score, making it easier to understand which features the model relied on most.
Swipe to start coding
You are given a Titanic dataset stored as a DataFrame
in the df
variable.
- Initialize the Random Forest model, set
random_state=42
, train it, and store the fitted model in therandom_forest
variable. - Calculate the cross-validation scores for the trained model using
10
folds, and store the resulting scores in thecv_scores
variable.
Solution
Thanks for your feedback!
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain how cross-validation works in this context?
How do I interpret the feature importance scores?
Can you show an example of how to use RandomForestClassifier with cross_val_score?
Awesome!
Completion rate improved to 4.17
Challenge: Implementing a Random Forest
Swipe to show menu
In sklearn
, the classification version of Random Forest is implemented using the RandomForestClassifier
:
You will also calculate the cross-validation accuracy using the cross_val_score()
function:
In the end, you'll print the importance of each feature. The feature_importances_
attribute returns an array of importance scores - these scores represent how much each feature contributed to reducing Gini impurity across all the decision nodes where that feature was used. In other words, the more a feature helps split the data in a useful way, the higher its importance.
However, the attribute only gives the scores without feature names. To display both, you can pair them using Python's zip()
function:
for feature, importance in zip(X.columns, model.feature_importances_):
print(feature, importance)
This prints each feature name along with its importance score, making it easier to understand which features the model relied on most.
Swipe to start coding
You are given a Titanic dataset stored as a DataFrame
in the df
variable.
- Initialize the Random Forest model, set
random_state=42
, train it, and store the fitted model in therandom_forest
variable. - Calculate the cross-validation scores for the trained model using
10
folds, and store the resulting scores in thecv_scores
variable.
Solution
Thanks for your feedback!
single