Models

You already know the basics of preprocessing your data and how to build pipelines. Now we can move to the fun part, modeling!

Let's recap what a model is. In Scikit-learn, it is an estimator that has both .predict() and .score() methods (and since it is an estimator, the .fit() method is also present).

.fit()

Once the data is preprocessed and ready to go to the model, the first step of building a model is training a model. This is done using the .fit(X, y).

During training, a model learns everything it needs to make predictions. What the model learns and the duration of training depend on the chosen algorithm. For each task, numerous models are available, based on different algorithms. Some train slower, while others train faster.

However, training is generally the most time-consuming aspect of machine learning. If the training set is large, a model could take minutes, hours, or even days to train.

.predict()

Once the model is trained using the .fit() method, it can perform predictions. Predicting is as easy as calling the .predict() method:

model.fit(X, y) # Train a model
y_pred = model.predict(X_new) # Get a prediction

Usually, you want to predict a target for new instances, X_new.

.score()

The .score() method is used to measure a trained model's performance. Usually, it is calculated on the test set (the following chapters will explain what it is). Here is the syntax:

model.fit(X, y) # Training the model
model.score(X_test, y_test)

The .score() method requires actual target values (y_test in the example). It calculates the prediction for X_test instances and compares this prediction with the true target (y_test) using some metric. By default, this metric is accuracy for classification.

Everything was clear?

Thanks for your feedback!

Section 4. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

ML Introduction with scikit-learn

1. Machine Learning Concepts

What is ML Types of Machine Learning Training Set Types of Data Machine Learning Workflow

2. Preprocessing Data with Scikit-learn

3. Pipelines

What is Pipeline ColumnTransformer Efficient Data Preprocessing with Pipelines Challenge: Creating a Pipeline Final Estimator Challenge: Creating a Complete ML Pipeline

4. Modeling