Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Models | Modeling
Quizzes & Challenges
Quizzes
Challenges
/
Introduction to Machine Learning with Python

bookModels

The fundamentals of data preprocessing and pipeline construction are now covered. The next step is modeling.

A model in Scikit-learn is an estimator that provides .predict() and .score() methods, along with .fit() inherited from all estimators.

.fit()

Once the data is preprocessed and ready to go to the model, the first step of building a model is training a model. This is done using the .fit(X, y).

Note
Note

For supervised learning (regression, classification), .fit() requires both X and y. For unsupervised learning (e.g., clustering), you call .fit(X) only. Passing y does not cause an error β€” it is simply ignored.

During training, the model learns patterns needed for prediction. What it learns and how long training takes depend on the algorithm. Training is often the slowest part of ML, especially with large datasets.

.predict()

After training, use .predict() to generate predictions:

model.fit(X, y)
y_pred = model.predict(X_new)

.score()

.score() evaluates a trained model, typically on a test set:

model.fit(X, y)
model.score(X_test, y_test)

It compares predictions with true targets. By default, the metric is accuracy for classification.

Note
Note

X_test refers to the subset of the dataset, known as the test set, used to evaluate a model's performance after training. It contains the features (input data). y_test is the corresponding subset of true labels for X_test. Together, they assess how well the model predicts new, unseen data.

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 1

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

What are some common algorithms used for modeling in Scikit-learn?

Can you explain the difference between `.fit()`, `.predict()`, and `.score()` in more detail?

How do I choose which metric to use for evaluating my model?

bookModels

Swipe to show menu

The fundamentals of data preprocessing and pipeline construction are now covered. The next step is modeling.

A model in Scikit-learn is an estimator that provides .predict() and .score() methods, along with .fit() inherited from all estimators.

.fit()

Once the data is preprocessed and ready to go to the model, the first step of building a model is training a model. This is done using the .fit(X, y).

Note
Note

For supervised learning (regression, classification), .fit() requires both X and y. For unsupervised learning (e.g., clustering), you call .fit(X) only. Passing y does not cause an error β€” it is simply ignored.

During training, the model learns patterns needed for prediction. What it learns and how long training takes depend on the algorithm. Training is often the slowest part of ML, especially with large datasets.

.predict()

After training, use .predict() to generate predictions:

model.fit(X, y)
y_pred = model.predict(X_new)

.score()

.score() evaluates a trained model, typically on a test set:

model.fit(X, y)
model.score(X_test, y_test)

It compares predictions with true targets. By default, the metric is accuracy for classification.

Note
Note

X_test refers to the subset of the dataset, known as the test set, used to evaluate a model's performance after training. It contains the features (input data). y_test is the corresponding subset of true labels for X_test. Together, they assess how well the model predicts new, unseen data.

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 1
some-alt