course content

Course Content

ML Introduction with scikit-learn

Machine Learning WorkflowMachine Learning Workflow

Let's look at the workflow you would go through to build a successful Machine Learning project.

Step 1. Get the data

For this step, you need to define the problem and what data is required. Then, choose a metric and define what result would be satisfactory.
Next, you need to gather this data together, usually from several sources (databases) in a format suitable for further processing in Python.
Sometimes the data is already in a .csv format and ready to be preprocessed, and this step can be skipped.

Step 2. Preprocess the data

This step consists of:

  • Data cleaning - dealing with missing values, non-numerical data, etc.
  • Exploratory data analysis(EDA) - analyzing and visualizing the dataset to find patterns and relationships between features and, in general, to get insights on how the training set can be improved.
  • Feature Engineering - selecting, transforming, or creating new features based on EDA insights to improve the model's performance.

Step 3. Modeling

This step involves:

  • Choosing the model - at this stage, you choose a model or few that perform best on your problem. It combines the algorithm's understanding and experiments with models to find the ones suitable for your problem.
  • Hyperparameter Tuning - a process of finding the hyperparameters that result in the best performance.
  • Evaluating the model - measuring the model's performance on the unseen data.

Step 4. Deployment

Once you have a fine-tuned model that shows good performance, you can deploy it. But that's not where your job ends. Most of the time, you also want to monitor the deployed model's performance, find ways to improve it, and feed new data as it is collected. It sends you back to step 1.


Don't worry if some terms sound unfamiliar to you. Many of them will be described in this course and some in other courses.
This course is an introductory course that covers many new topics. If you have no experience with Machine Learning, it is okay to struggle with some topics; most important will be repeated in this course or other courses, and you'll catch up!

Data Preprocessing and Modeling steps can be completed using the scikit-learn(imported as sklearn) library. That is what the rest of the course is about.
We will learn some basic preprocessing steps and learn how to build pipelines. After that, we will discuss the modeling stage using the KNearestClassifier as an example of the model. This includes building a model, tuning hyperparameters, and evaluating the model.

Everything was clear?

Section 1. Chapter 5