Train and Test Split

The train-test split procedure is used to estimate the performance of machine learning algorithms when they are used to make predictions on data not used to train the model.

It is a fast and easy procedure to perform, the results of which allow you to compare the performance of machine learning algorithms for your predictive modeling problem.

Methods description

sklearn: This module provides simple and efficient tools for data mining and data analysis. It includes various algorithms and utilities for machine learning tasks;
model_selection: This submodule within sklearn provides tools for model selection and evaluation, including methods for splitting data into training and testing sets;
.train_test_split(): This function splits arrays or matrices into random train and test subsets. It takes in arrays X and y representing features and target variables, respectively. The test_size parameter determines the proportion of the dataset to include in the test split. The random_state parameter sets the seed used for random sampling to ensure reproducibility. It returns four arrays: X_train, X_test, y_train, and y_test, representing the training and testing sets for features and target variables, respectively.

Task

Import train_test_split from sklearn.
Define X as all the features (exclude "target").
Define y as the "target" variable.
Split the training and the test set with a size of 67% (train) and 33% (test).

Mark tasks as Completed

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Thanks for your feedback!