Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
course content

Course Content

Python for Data Science: Job Change

Train and Test SplitTrain and Test Split

The train-test split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model.

It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive modeling problem.

Methods description

  • sklearn: This module provides simple and efficient tools for data mining and data analysis. It includes various algorithms and utilities for machine learning tasks;
  • model_selection: This submodule within sklearn provides tools for model selection and evaluation, including methods for splitting data into training and testing sets;
  • .train_test_split(): This function splits arrays or matrices into random train and test subsets. It takes in arrays X and y representing features and target variables, respectively. The test_size parameter determines the proportion of the dataset to include in the test split. The random_state parameter sets the seed used for random sampling to ensure reproducibility. It returns four arrays: X_train, X_test, y_train, and y_test, representing the training and testing sets for features and target variables, respectively.

Task

  1. Import train_test_split from sklearn;
  2. Define X as all the features (exclude "target");
  3. Define y as the "target" variable;
  4. Split the training and the test set with a size of 67% (train) and 33% (test).

Mark tasks as Completed

Everything was clear?

Section 1. Chapter 5
AVAILABLE TO ULTIMATE ONLY
some-alt