Apprendre Scikit-learn Concepts | Preprocessing Data with Scikit-learn

The scikit-learn (imported as sklearn) library offers various functions and classes for preprocessing data and modeling. The main sklearn objects are estimator, transformer, predictor, and model.

Estimator

Each sklearn's class with the .fit() method is considered an estimator. The .fit() method allows an object to learn from the data.

In other words, the .fit() method is for training an object. It takes X and y parameters (y is optional for unsupervised learning tasks).

estimator.fit(X, y) # For supervised learning tasks
estimator.fit(X) # For unsupervised learning tasks

As you can tell, it's not very helpful if an object only learns from data without doing anything with it. However, the two objects — the transformer and the predictor — that inherit from the estimator are much more useful.

Transformer

A transformer has the .fit() method and the .transform() method that transforms the data in some way.

Usually, transformers need to learn something from data before transforming it, so you need to apply .fit() and then .transform(). To avoid that, transformers also have the .fit_transform() method.
.fit_transform() leads to the same result as applying .fit() and .transform() sequentially, but is sometimes faster, so it is preferable over .fit().transform().

transformer.fit(X) # Train the transformer
transformer.transform(X) # Transform the data using an already trained transformer
transformer.fit_transform(X) # Train the transformer and transform the data

nan values shown in the training set in the picture indicate missing data in Python.

Predictor

A predictor is an estimator (has the .fit() method) that has the .predict() method. The .predict() method is used for making predictions.

predictor.fit(X, y) # Training the predictor
predictor.predict(X_new) # Predicting the target for new instances once the predictor is trained.

Model

A model is a type of predictor that also includes the .score() method. This method calculates a score (metric) to measure the predictor's performance.

model.fit(X, y) # Train the model
model.score(X, y) # Calculate a score for the trained model on X, y set.

As mentioned in the previous chapter, accuracy is a metric representing the percentage of correct predictions.

The preprocessing stage involves working with transformers, and we work with predictors (more specifically with models) at the modeling stage.

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 1

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Glissez pour afficher le menu

Estimator

Each sklearn's class with the .fit() method is considered an estimator. The .fit() method allows an object to learn from the data.

In other words, the .fit() method is for training an object. It takes X and y parameters (y is optional for unsupervised learning tasks).

estimator.fit(X, y) # For supervised learning tasks
estimator.fit(X) # For unsupervised learning tasks

Transformer

A transformer has the .fit() method and the .transform() method that transforms the data in some way.

transformer.fit(X) # Train the transformer
transformer.transform(X) # Transform the data using an already trained transformer
transformer.fit_transform(X) # Train the transformer and transform the data

nan values shown in the training set in the picture indicate missing data in Python.

Predictor

A predictor is an estimator (has the .fit() method) that has the .predict() method. The .predict() method is used for making predictions.

predictor.fit(X, y) # Training the predictor
predictor.predict(X_new) # Predicting the target for new instances once the predictor is trained.

Model

A model is a type of predictor that also includes the .score() method. This method calculates a score (metric) to measure the predictor's performance.

model.fit(X, y) # Train the model
model.score(X, y) # Calculate a score for the trained model on X, y set.

As mentioned in the previous chapter, accuracy is a metric representing the percentage of correct predictions.

The preprocessing stage involves working with transformers, and we work with predictors (more specifically with models) at the modeling stage.

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 1

Scikit-learn Concepts

Estimator

Transformer

Predictor

Model

Awesome!

Scikit-learn Concepts

Estimator

Transformer

Predictor

Model