In this challenge, scale the features of the **penguins dataset** (already encoded and without missing values) using `StandardScaler`.


import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed_encoded.csv')

print(df)

Here is a little reminder of the `StandardScaler` class.

import unittest
import pandas as pd
import numpy as np

def _dynamic_test(test_case, condition, success_message, failure_message):
    if condition:
        test_case._testMethodName = success_message
        test_case.assertTrue(True, success_message)
    else:
        test_case._testMethodName = failure_message
        test_case.fail(failure_message)

class TestStandardScaler(unittest.TestCase):

    @classmethod
    def setUpClass(cls):
        cls.df_raw = pd.read_csv(
            'https://codefinity-content-media.s3.eu-west-1.amazonaws.com/a65bbc96-309e-4df9-a790-a1eb8c815a1c/penguins_imputed_encoded.csv'
        )
        import user_code
        cls.user_code = user_code

    def test_imports_scaler(self):
        from sklearn.preprocessing import StandardScaler
        uc = self.user_code
        cond = isinstance(getattr(uc, 'scaler', None), StandardScaler)
        _dynamic_test(
            self,
            cond,
            "Used StandardScaler from sklearn.preprocessing",
            "Used StandardScaler from sklearn.preprocessing"
        )

    def test_X_scaled_type(self):
        uc = self.user_code
        cond = isinstance(uc.X, (np.ndarray, pd.DataFrame))
        _dynamic_test(
            self,
            cond,
            "X was transformed using scaler.fit_transform",
            "X was transformed using scaler.fit_transform"
        )

    def test_X_scaled_mean_var(self):
        uc = self.user_code
        X_arr = np.asarray(uc.X)
        mean_close = np.allclose(np.mean(X_arr, axis=0), 0, atol=1e-7)
        var_close = np.allclose(np.var(X_arr, axis=0), 1, atol=1e-7)
        cond = mean_close and var_close
        _dynamic_test(
            self,
            cond,
            "X columns have mean ~0 and variance ~1 after scaling",
            "X columns have mean ~0 and variance ~1 after scaling"
        )

if __name__ == "__main__":
    unittest.main()

test_code.py

Machine learning is now used everywhere. Want to learn it yourself? This course is an introduction to the world of Machine learning for you to learn basic concepts, work with Scikit-learn – the most popular library for ML and build your first Machine Learning project.
This course is intended for students with a basic knowledge of Python, Pandas, and Numpy.

Learn the Machine Learning concepts and the ML project workflow.

Preprocessing is probably the most important stage of an ML project. This chapter covers the preprocessing steps needed for almost any dataset.

A pipeline is a neat way to combine all the preprocessing steps as well as a model. Pipelines make it much easier to train and use a model.

Modeling is the most fun stage of an ML project. Let's learn to build, fine-tune and evaluate the model!

Challenge: Scaling the Features

Solution