Related courses

Intermediate

Cluster Analysis

Clustering is one of the fundamental machine learning techniques that allows you to solve many complex problems in real life: market segmentation, anomaly detection, dimensionality reduction, revealing hidden patterns, etc.

python

course

Intermediate

ML Introduction with scikit-learn

Machine learning is now used everywhere. Want to learn it yourself? This course is an introduction to the world of Machine learning for you to learn basic concepts, work with Scikit-learn – the most popular library for ML and build your first Machine Learning project. This course is intended for students with a basic knowledge of Python, Pandas, and Numpy.

python

4.6

Data ScienceMachine Learning

Sequential Backward and Forward Selection

by Andrii Chornyi

Data Scientist, ML Engineer

Jan, 2024・
15 min read

Sequential Backward and Forward Selection

Introduction

Sequential Backward Selection (SBS) and Sequential Forward Selection (SFS) are feature selection techniques used in machine learning to enhance model performance. They optimize the feature set by either progressively removing or adding features, respectively.

These methods are especially crucial in scenarios where reducing the dimensionality of the feature space can lead to more accurate and efficient models.

What is the Sequential Backward Selection (SBS)?

The Sequential Backward Selection (SBS) algorithm is an iterative approach commonly used for optimizing or improving selections in feature selection for machine learning.

SBS is a greedy algorithm that starts with the full set of features and sequentially removes the least significant features, one at a time, to improve the model's performance.

Here's a detailed breakdown of how it works and why it's effective in some contexts.

Step-by-Step Process of SBS Algorithm

Start with All Features
Sequential Backward Selection (SBS) begins with the full set of features or items in the dataset, which ensures that every potential component is considered initially.
Define Evaluation Metric
Define an evaluation metric to measure the quality of a subset (e.g., accuracy for a classification problem, or mean squared error for regression). This metric will determine whether a subset is improving or declining in performance as features are removed.
Iterative Elimination
For each iteration:
- Remove One Element: each feature is removed one at a time, creating multiple subsets. For each subset with one less feature, evaluate the subset's performance using the defined metric;
- Evaluate Subsets: calculate the evaluation metric for each subset to determine which removal has the least impact on performance;
- Select Optimal Subset: identify and keep the subset that has the highest performance (or least degradation in performance).
Repeat Until Desired Number of Features is Reached
SBS continues iterating until the number of remaining features reaches the specified minimum subset size.

Example Walkthrough

Suppose SBS is used to optimize features in a dataset with an initial set of features, F = {f1, f2, f3, f4}. Let's assume that the goal is to reach 2 features.

Initial Step
- Start with the full set, {f1, f2, f3, f4}.
First Iteration
- Evaluate subsets by removing one feature at a time: {f2, f3, f4}, {f1, f3, f4}, {f1, f2, f4}, {f1, f2, f3};
- Suppose {f2, f3, f4} gives the best performance metric; select this subset.
Second Iteration
- Repeat by removing one more feature from {f2, f3, f4}, producing {f3, f4}, {f2, f4}, and {f2, f3};
- Suppose {f2, f3} gives the best result, so we select {f2, f3} as the final subset.

Implementation in Python

Python provides libraries such as mlxtend that can be used to implement these techniques easily:

# Importing SequentialFeatureSelector from the mlxtend library for feature selection
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
# Importing the KNeighborsClassifier from scikit-learn to use as the model
from sklearn.neighbors import KNeighborsClassifier

# Setting up the classifier model: K-Nearest Neighbors with 3 neighbors
knn = KNeighborsClassifier(n_neighbors=3)

# Setting up the Sequential Backward Selection (SBS)
# Initializing the SFS with:
# - the knn classifier as the model,
# - k_features=3 to select the best 3 features,
# - forward=False to perform backward selection,
# - floating=False to maintain a fixed number of selected features per iteration,
# - scoring='accuracy' to evaluate features based on classification accuracy
sbs = SFS(knn, k_features=3, forward=False, floating=False, scoring='accuracy')
# Fitting the SBS model to the training data (X_train and y_train)
sbs = sbs.fit(X_train, y_train)

Run Code from Your Browser - No Installation Required

What is the Sequential Forward Selection (SFS)?

The Sequential Forward Selection (SFS) algorithm shares the same goal as Sequential Backward Selection, but it reaches it by following the opposite approach.

SFS begins with an empty feature set and gradually adds features one by one, optimizing each addition to improve the target metric.

Here's a detailed look at its structure and application.

Step-by-Step Process of SFS Algorithm

Start with an Empty Feature Set
- Sequential Forward Selection initializes with no features selected, which contrasts with the SBS approach. This empty set is the baseline from which features are added iteratively.
Define the Evaluation Metric
- As with other feature selection algorithms, an evaluation metric is required to assess the model's performance with each subset. Common metrics include accuracy, F1 score, or mean squared error depending on the task (classification, regression, etc.).
Iterative Addition For each iteration:
- Add One Feature at a Time: each available feature is added to the current subset individually, generating multiple new subsets, each with one additional feature;
- Evaluate Subsets: each new subset is evaluated for performance using the chosen metric;
- Select Optimal Subset: choose the subset with the highest performance improvement and retain this combination for the next iteration.
Continue Until Stopping Criterion is Met The process repeats until the desired number of features is reached or the evaluation metric does not improve significantly with the addition of new features.

Example Walkthrough

Let's consider a dataset with features F = {f1, f2, f3, f4}. If the objective is to select 3 features, here's how SFS works:

Initial Step
- Start with an empty set {}, and evaluate performance with no features (often used as a baseline).
First Iteration:
- Evaluate adding one feature at a time: {f1}, {f2}, {f3}, and {f4};
- Suppose {f2} produces the highest performance. Select {f2} as the optimal subset for this iteration.
Second Iteration:
- Add each remaining feature to {f2}, resulting in {f2, f1}, {f2, f3}, and {f2, f4};
- Suppose {f2, f3} yields the highest improvement. Choose {f2, f3} as the subset for the next iteration.
Third Iteration:
- Add each remaining feature to {f2, f3}, resulting in {f2, f3, f1} and {f2, f3, f4};
- Suppose {f2, f3, f1} gives the best performance, so select {f2, f3, f1} as the final subset.

Implementation in Python

Python provides libraries such as mlxtend that can be used to implement these techniques easily:

# Importing SequentialFeatureSelector from the mlxtend library for feature selection
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
# Importing the KNeighborsClassifier from scikit-learn to use as the model
from sklearn.neighbors import KNeighborsClassifier

# Setting up the classifier model: K-Nearest Neighbors with 3 neighbors
knn = KNeighborsClassifier(n_neighbors=3)

# Setting up the Sequential Forward Selection (SFS)
# Initializing the SFS with:
# - the knn classifier as the model,
# - k_features=3 to select the best 3 features,
# - forward=True to perform forward selection,
# - floating=False to maintain a fixed number of selected features per iteration,
# - scoring='accuracy' to evaluate features based on classification accuracy
sfs = SFS(knn, k_features=3, forward=True, floating=False, scoring='accuracy')
# Fitting the SFS model to the training data (X_train and y_train)
sfs = sfs.fit(X_train, y_train)

Common Aspects in SBS and SFS

Greedy Nature

Both SBS and SFS are greedy algorithms. They make the locally optimal choice at each iteration with the hope of finding the global optimum. However, they do not revisit past decisions; once a feature is removed (in SBS) or added (in SFS), the decision is final.

Model Dependency

The effectiveness of both SBS and SFS is highly dependent on the underlying model used for evaluation. Different models might yield different subsets of features as optimal.

Computational Efficiency

While more efficient than exhaustive search, both SBS and SFS can be computationally expensive, especially with a large number of features.

Stopping Criteria

The stopping criteria for both algorithms can be based on a predefined number of features or a performance threshold. This needs to be defined based on the specific requirements of the problem at hand.

Comparison of SBS and SFS

Here's a table comparing Successive Forward Selection (SFS) and Successive Backward Selection (SBS) in feature selection.

Feature Selection Method	Successive Forward Selection (SFS)	Successive Backward Selection (SBS)
Starting Point	Begins with an empty feature set	Starts with the full feature set
Selection Direction	Adds features iteratively	Removes features iteratively
Algorithm Objective	Adds features to maximize performance	Removes features to minimize loss of performance
Computation Time	Faster with fewer features initially selected	More computationally intensive due to full set
Suitability for Feature Space Size	Ideal for small to moderate feature spaces	Effective for moderate to large feature spaces
Risk of Local Optima	Can miss optimal subsets due to greedy approach	Also risks suboptimal selection by removing key features
Memory Usage	Low, as fewer features are stored initially	Higher, as the full feature set is loaded initially
Typical Use Cases	Effective in scenarios where a few key features need identification, like bioinformatics and signal processing	Useful for high-dimensional datasets, like image recognition or text processing

Start Learning Coding today and boost your Career Potential

Conclusion

Sequential Backward and Forward Selection are invaluable tools in feature selection, offering a structured approach to enhancing model accuracy and efficiency. By methodically adding or removing features, these techniques help in building more robust and interpretable machine learning models.

FAQs

Q: When should I use SBS over SFS?
A: SBS is preferable when you start with a large set of features and need to eliminate the irrelevant ones. SFS is ideal when building a feature set from scratch.

Q: Can these methods handle categorical data?
A: Yes, but the categorical data should be properly encoded before applying these methods.

Q: Are SBS and SFS prone to overfitting?
A: These methods can reduce overfitting by eliminating irrelevant features. However, careful cross-validation is still necessary to avoid overfitting.

Q: How do I decide on the stopping criterion for SBS or SFS?
A: The stopping criterion depends on specific goals, like desired model complexity or performance metrics.

Q: Is feature scaling required before applying SBS or SFS?
A: Feature scaling is not mandatory but is generally recommended, especially for algorithms sensitive to feature scales.

Q: How do these methods compare with other feature selection techniques?
A: Sequential Forward Selection (SFS) and Sequential Backward Selection (SBS) offer a model-driven approach that captures feature interactions, unlike filter methods which assess features individually. However, SFS and SBS are slower than embedded methods like Lasso and may struggle with high-dimensional data due to their iterative and computationally intensive nature.

Was this article helpful?