Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Training Set | Machine Learning Concepts
ML Introduction with scikit-learn

bookTraining Set

In supervised or unsupervised learning, the training set is usually presented in a tabular format.

An example is the diabetes dataset, which is used to predict whether a person has diabetes. It contains records of 768 females with parameters such as age, body mass index, and blood pressure. These parameters are referred to as features.

The dataset also includes an 'Outcome' column indicating whether the person has diabetes. This is the target variable.

Each row in the table is an instance (also called a data point or sample), representing information about a single individual.

The table (training set) has a target column in it, which means it is labeled.

The task is to train the ML model on this training set, and once it is trained, it can predict for other people (new instances) whether they have diabetes based on features only.

Note
Note

This training set is an example of a biased dataset as it exclusively contains information about females who are at least 21 years old. Therefore, the model may produce less accurate predictions for males or for females under 21, since it hasn't been trained on these groups.

While coding, feature columns are usually assigned to X and target columns assigned as y.

And features of new instances are assigned as X_new.

question-icon

Match the variable names with the data they usually hold.

X –
y –

X_new –

Click or drag`n`drop items and fill in the blanks

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

What is the difference between features and the target variable?

Can you explain what X, y, and X_new represent in machine learning?

How does the model use X_new to make predictions?

Awesome!

Completion rate improved to 3.13

bookTraining Set

Swipe to show menu

In supervised or unsupervised learning, the training set is usually presented in a tabular format.

An example is the diabetes dataset, which is used to predict whether a person has diabetes. It contains records of 768 females with parameters such as age, body mass index, and blood pressure. These parameters are referred to as features.

The dataset also includes an 'Outcome' column indicating whether the person has diabetes. This is the target variable.

Each row in the table is an instance (also called a data point or sample), representing information about a single individual.

The table (training set) has a target column in it, which means it is labeled.

The task is to train the ML model on this training set, and once it is trained, it can predict for other people (new instances) whether they have diabetes based on features only.

Note
Note

This training set is an example of a biased dataset as it exclusively contains information about females who are at least 21 years old. Therefore, the model may produce less accurate predictions for males or for females under 21, since it hasn't been trained on these groups.

While coding, feature columns are usually assigned to X and target columns assigned as y.

And features of new instances are assigned as X_new.

question-icon

Match the variable names with the data they usually hold.

X –
y –

X_new –

Click or drag`n`drop items and fill in the blanks

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 3
some-alt