Learn Machine Learning Workflow | Machine Learning Concepts

Let's look at the workflow you would go through to build a successful machine learning project.

Step 1. Get the Data

Define the problem, choose a performance metric, and decide what qualifies as a good result. Then gather the required data from available sources and bring it into a format ready for Python. If the data already exists in a CSV file, preprocessing can begin immediately.

Example

A hospital compiles patient records and demographics into a CSV file. The goal is to predict readmissions, aiming for over 80% accuracy.

Step 2. Preprocess the Data

This step includes:

Data cleaning: handling missing values and non-numerical inputs;
EDA: analyzing and visualizing data to understand relationships and detect issues;
Feature engineering: selecting or creating features that improve model performance.

Example

Missing values (e.g., blood pressure) are filled, and categorical features (e.g., race) are converted into numerical form.

Step 3. Modeling

This stage includes:

Choosing a model based on problem type and experiments;
Hyperparameter tuning to improve performance;
Model evaluation on unseen data.

Study More

Hyperparameters are like adjustable controls that define how the model trains—such as training duration or model complexity.

Example

A classification model is selected for predicting readmission (yes/no). After tuning, it is evaluated on a validation/test set to assess generalization.

Step 4. Deployment

Once a model performs well, it is deployed to real systems. The model must be monitored, updated with new data, and improved over time, often restarting the cycle from Step 1.

Example

The model is integrated into the hospital system to flag high-risk patients at admission, helping staff act early.

Note

Some of these terms mentioned here may sound unfamiliar, but we'll discuss them in more detail later in this course.

Data preprocessing and modeling can be done with scikit-learn. The next chapters introduce preprocessing workflows and pipelines, followed by modeling using k-nearest neighbors (KNeighborsClassifier), including training, tuning, and evaluation.

1. What is the primary purpose of the "Get the data" step in a machine learning project?

2. Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 5

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain more about data preprocessing steps?

What is feature engineering and why is it important?

How does KNeighborsClassifier work in machine learning?

Awesome!

Completion rate improved to 3.13

Swipe to show menu

Let's look at the workflow you would go through to build a successful machine learning project.

Step 1. Get the Data

Example

A hospital compiles patient records and demographics into a CSV file. The goal is to predict readmissions, aiming for over 80% accuracy.

Step 2. Preprocess the Data

This step includes:

Data cleaning: handling missing values and non-numerical inputs;
EDA: analyzing and visualizing data to understand relationships and detect issues;
Feature engineering: selecting or creating features that improve model performance.

Example

Missing values (e.g., blood pressure) are filled, and categorical features (e.g., race) are converted into numerical form.

Step 3. Modeling

This stage includes:

Choosing a model based on problem type and experiments;
Hyperparameter tuning to improve performance;
Model evaluation on unseen data.

Study More

Hyperparameters are like adjustable controls that define how the model trains—such as training duration or model complexity.

Example

A classification model is selected for predicting readmission (yes/no). After tuning, it is evaluated on a validation/test set to assess generalization.

Step 4. Deployment

Once a model performs well, it is deployed to real systems. The model must be monitored, updated with new data, and improved over time, often restarting the cycle from Step 1.

Example

The model is integrated into the hospital system to flag high-risk patients at admission, helping staff act early.

Note

Some of these terms mentioned here may sound unfamiliar, but we'll discuss them in more detail later in this course.

1. What is the primary purpose of the "Get the data" step in a machine learning project?

2. Which of the following best describes the importance of the "Data preprocessing" step in a machine learning project workflow?

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 5