SectionΒ 2. ChapterΒ 6
single
Challenge: Preprocessing the Dataset
Swipe to show menu
Task
Swipe to start coding
You are given a synthetic dataset stored in the data variable. Your task is to handle missing values and encode categorical features properly.
Follow these steps:
- Replace missing values in the
'Age'column with the mean value of this column. Overwrite the original column with the result. - Create an instance of
OneHotEncoderand store it in thecity_encodervariable. Make sure to specifydrop='first'to avoid the dummy variable trap.- By default, this encoder returns a sparse matrix. To make it compatible with Pandas later, set the parameter
sparse_output=False(orsparse=Falsefor older versions) during initialization, OR append.toarray()when you transform the data.
- By default, this encoder returns a sparse matrix. To make it compatible with Pandas later, set the parameter
- Encode the values in the
'City'column usingcity_encoder.fit_transform()and store the resulting array in thecity_encodedvariable. - Create an instance of
OrdinalEncoderand store it in theincome_encodervariable. Since the data has a natural hierarchy, explicitly define the order using thecategoriesparameter (note that'Low'<'Middle'<'High'). - Encode the values in the
'Income'column usingincome_encoderand overwrite the original'Income'column with the result.
Solution
Everything was clear?
Thanks for your feedback!
SectionΒ 2. ChapterΒ 6
single
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat