Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Machine Learning Workflow | Machine Learning Concepts
Quizzes & Challenges
Quizzes
Challenges
/
Introduction to Machine Learning with Python

bookMachine Learning Workflow

Let's look at the workflow you would go through to build a successful machine learning project.

Step 1. Get the Data

Define the problem, choose a performance metric, and decide what qualifies as a good result. Then gather the required data from available sources and bring it into a format ready for Python. If the data already exists in a CSV file, preprocessing can begin immediately.

Example

A hospital compiles patient records and demographics into a CSV file. The goal is to predict readmissions, aiming for over 80% accuracy.

Step 2. Preprocess the Data

This step includes:

  • Data cleaning: handling missing values and non-numerical inputs;
  • EDA: analyzing and visualizing data to understand relationships and detect issues;
  • Feature engineering: selecting or creating features that improve model performance.

Example

Missing values (e.g., blood pressure) are filled, and categorical features (e.g., race) are converted into numerical form.

Step 3. Modeling

This stage includes:

  • Choosing a model based on problem type and experiments;
  • Hyperparameter tuning to improve performance;
  • Model evaluation on unseen data.
Note
Study More

Hyperparameters are like adjustable controls that define how the model trainsβ€”such as training duration or model complexity.

Example

A classification model is selected for predicting readmission (yes/no). After tuning, it is evaluated on a validation/test set to assess generalization.

Step 4. Deployment

Once a model performs well, it is deployed to real systems. The model must be monitored, updated with new data, and improved over time, often restarting the cycle from Step 1.

Example

The model is integrated into the hospital system to flag high-risk patients at admission, helping staff act early.

Note
Note

Some of these terms mentioned here may sound unfamiliar, but we'll discuss them in more detail later in this course.

Data preprocessing and modeling can be done with scikit-learn. The next chapters introduce preprocessing workflows and pipelines, followed by modeling using k-nearest neighbors (KNeighborsClassifier), including training, tuning, and evaluation.

1. What is the primary purpose of the "Get the data" step in a machine learning project?

2. Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?

question mark

What is the primary purpose of the "Get the data" step in a machine learning project?

Select the correct answer

question mark

Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 5

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain more about data preprocessing steps?

What is feature engineering and why is it important?

How does KNeighborsClassifier work in machine learning?

Awesome!

Completion rate improved to 3.13

bookMachine Learning Workflow

Swipe to show menu

Let's look at the workflow you would go through to build a successful machine learning project.

Step 1. Get the Data

Define the problem, choose a performance metric, and decide what qualifies as a good result. Then gather the required data from available sources and bring it into a format ready for Python. If the data already exists in a CSV file, preprocessing can begin immediately.

Example

A hospital compiles patient records and demographics into a CSV file. The goal is to predict readmissions, aiming for over 80% accuracy.

Step 2. Preprocess the Data

This step includes:

  • Data cleaning: handling missing values and non-numerical inputs;
  • EDA: analyzing and visualizing data to understand relationships and detect issues;
  • Feature engineering: selecting or creating features that improve model performance.

Example

Missing values (e.g., blood pressure) are filled, and categorical features (e.g., race) are converted into numerical form.

Step 3. Modeling

This stage includes:

  • Choosing a model based on problem type and experiments;
  • Hyperparameter tuning to improve performance;
  • Model evaluation on unseen data.
Note
Study More

Hyperparameters are like adjustable controls that define how the model trainsβ€”such as training duration or model complexity.

Example

A classification model is selected for predicting readmission (yes/no). After tuning, it is evaluated on a validation/test set to assess generalization.

Step 4. Deployment

Once a model performs well, it is deployed to real systems. The model must be monitored, updated with new data, and improved over time, often restarting the cycle from Step 1.

Example

The model is integrated into the hospital system to flag high-risk patients at admission, helping staff act early.

Note
Note

Some of these terms mentioned here may sound unfamiliar, but we'll discuss them in more detail later in this course.

Data preprocessing and modeling can be done with scikit-learn. The next chapters introduce preprocessing workflows and pipelines, followed by modeling using k-nearest neighbors (KNeighborsClassifier), including training, tuning, and evaluation.

1. What is the primary purpose of the "Get the data" step in a machine learning project?

2. Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?

question mark

What is the primary purpose of the "Get the data" step in a machine learning project?

Select the correct answer

question mark

Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 5
some-alt