Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Sequential Backward and Forward Selection
Data ScienceMachine Learning

Sequential Backward and Forward Selection

Sequential Backward and Forward Selection

Andrii Chornyi

by Andrii Chornyi

Data Scientist, ML Engineer

Jan, 2024
15 min read

facebooklinkedintwitter
copy
Sequential Backward and Forward Selection

Introduction

Sequential Backward Selection (SBS) and Sequential Forward Selection (SFS) are feature selection techniques used in machine learning to enhance model performance. They optimize the feature set by either progressively removing or adding features, respectively.

These methods are especially crucial in scenarios where reducing the dimensionality of the feature space can lead to more accurate and efficient models.

What is the Sequential Backward Selection (SBS)?

The Sequential Backward Selection (SBS) algorithm is an iterative approach commonly used for optimizing or improving selections in feature selection for machine learning.

SBS is a greedy algorithm that starts with the full set of features and sequentially removes the least significant features, one at a time, to improve the model's performance.

Here's a detailed breakdown of how it works and why it's effective in some contexts.

Step-by-Step Process of SBS Algorithm

  1. Start with All Features
    Sequential Backward Selection (SBS) begins with the full set of features or items in the dataset, which ensures that every potential component is considered initially.
  2. Define Evaluation Metric
    Define an evaluation metric to measure the quality of a subset (e.g., accuracy for a classification problem, or mean squared error for regression). This metric will determine whether a subset is improving or declining in performance as features are removed.
  3. Iterative Elimination
    For each iteration:
    • Remove One Element: each feature is removed one at a time, creating multiple subsets. For each subset with one less feature, evaluate the subset's performance using the defined metric;
    • Evaluate Subsets: calculate the evaluation metric for each subset to determine which removal has the least impact on performance;
    • Select Optimal Subset: identify and keep the subset that has the highest performance (or least degradation in performance).
  4. Repeat Until Desired Number of Features is Reached
    SBS continues iterating until the number of remaining features reaches the specified minimum subset size.

Example Walkthrough

Suppose SBS is used to optimize features in a dataset with an initial set of features, F = {f1, f2, f3, f4}. Let's assume that the goal is to reach 2 features.

Sequential Backward Selection
  1. Initial Step
    • Start with the full set, {f1, f2, f3, f4}.
  2. First Iteration
    • Evaluate subsets by removing one feature at a time: {f2, f3, f4}, {f1, f3, f4}, {f1, f2, f4}, {f1, f2, f3};
    • Suppose {f2, f3, f4} gives the best performance metric; select this subset.
  3. Second Iteration
    • Repeat by removing one more feature from {f2, f3, f4}, producing {f3, f4}, {f2, f4}, and {f2, f3};
    • Suppose {f2, f3} gives the best result, so we select {f2, f3} as the final subset.

Implementation in Python

Python provides libraries such as mlxtend that can be used to implement these techniques easily:

Run Code from Your Browser - No Installation Required

Run Code from Your Browser - No Installation Required

What is the Sequential Forward Selection (SFS)?

The Sequential Forward Selection (SFS) algorithm shares the same goal as Sequential Backward Selection, but it reaches it by following the opposite approach.

SFS begins with an empty feature set and gradually adds features one by one, optimizing each addition to improve the target metric.

Here's a detailed look at its structure and application.

Step-by-Step Process of SFS Algorithm

  1. Start with an Empty Feature Set
    • Sequential Forward Selection initializes with no features selected, which contrasts with the SBS approach. This empty set is the baseline from which features are added iteratively.
  2. Define the Evaluation Metric
    • As with other feature selection algorithms, an evaluation metric is required to assess the model's performance with each subset. Common metrics include accuracy, F1 score, or mean squared error depending on the task (classification, regression, etc.).
  3. Iterative Addition For each iteration:
    • Add One Feature at a Time: each available feature is added to the current subset individually, generating multiple new subsets, each with one additional feature;
    • Evaluate Subsets: each new subset is evaluated for performance using the chosen metric;
    • Select Optimal Subset: choose the subset with the highest performance improvement and retain this combination for the next iteration.
  4. Continue Until Stopping Criterion is Met The process repeats until the desired number of features is reached or the evaluation metric does not improve significantly with the addition of new features.

Example Walkthrough

Let's consider a dataset with features F = {f1, f2, f3, f4}. If the objective is to select 3 features, here's how SFS works:

Sequential Backward Selection
  1. Initial Step
    • Start with an empty set {}, and evaluate performance with no features (often used as a baseline).
  2. First Iteration:
    • Evaluate adding one feature at a time: {f1}, {f2}, {f3}, and {f4};
    • Suppose {f2} produces the highest performance. Select {f2} as the optimal subset for this iteration.
  3. Second Iteration:
    • Add each remaining feature to {f2}, resulting in {f2, f1}, {f2, f3}, and {f2, f4};
    • Suppose {f2, f3} yields the highest improvement. Choose {f2, f3} as the subset for the next iteration.
  4. Third Iteration:
    • Add each remaining feature to {f2, f3}, resulting in {f2, f3, f1} and {f2, f3, f4};
    • Suppose {f2, f3, f1} gives the best performance, so select {f2, f3, f1} as the final subset.

Implementation in Python

Python provides libraries such as mlxtend that can be used to implement these techniques easily:

Common Aspects in SBS and SFS

Greedy Nature

Both SBS and SFS are greedy algorithms. They make the locally optimal choice at each iteration with the hope of finding the global optimum. However, they do not revisit past decisions; once a feature is removed (in SBS) or added (in SFS), the decision is final.

Model Dependency

The effectiveness of both SBS and SFS is highly dependent on the underlying model used for evaluation. Different models might yield different subsets of features as optimal.

Computational Efficiency

While more efficient than exhaustive search, both SBS and SFS can be computationally expensive, especially with a large number of features.

Stopping Criteria

The stopping criteria for both algorithms can be based on a predefined number of features or a performance threshold. This needs to be defined based on the specific requirements of the problem at hand.

Comparison of SBS and SFS

Here's a table comparing Successive Forward Selection (SFS) and Successive Backward Selection (SBS) in feature selection.

Feature Selection MethodSuccessive Forward Selection (SFS)Successive Backward Selection (SBS)
Starting PointBegins with an empty feature setStarts with the full feature set
Selection DirectionAdds features iterativelyRemoves features iteratively
Algorithm ObjectiveAdds features to maximize performanceRemoves features to minimize loss of performance
Computation TimeFaster with fewer features initially selectedMore computationally intensive due to full set
Suitability for Feature Space SizeIdeal for small to moderate feature spacesEffective for moderate to large feature spaces
Risk of Local OptimaCan miss optimal subsets due to greedy approachAlso risks suboptimal selection by removing key features
Memory UsageLow, as fewer features are stored initiallyHigher, as the full feature set is loaded initially
Typical Use CasesEffective in scenarios where a few key features need identification, like bioinformatics and signal processingUseful for high-dimensional datasets, like image recognition or text processing

Start Learning Coding today and boost your Career Potential

Start Learning Coding today and boost your Career Potential

Conclusion

Sequential Backward and Forward Selection are invaluable tools in feature selection, offering a structured approach to enhancing model accuracy and efficiency. By methodically adding or removing features, these techniques help in building more robust and interpretable machine learning models.

FAQs

Q: When should I use SBS over SFS?
A: SBS is preferable when you start with a large set of features and need to eliminate the irrelevant ones. SFS is ideal when building a feature set from scratch.

Q: Can these methods handle categorical data?
A: Yes, but the categorical data should be properly encoded before applying these methods.

Q: Are SBS and SFS prone to overfitting?
A: These methods can reduce overfitting by eliminating irrelevant features. However, careful cross-validation is still necessary to avoid overfitting.

Q: How do I decide on the stopping criterion for SBS or SFS?
A: The stopping criterion depends on specific goals, like desired model complexity or performance metrics.

Q: Is feature scaling required before applying SBS or SFS?
A: Feature scaling is not mandatory but is generally recommended, especially for algorithms sensitive to feature scales.

Q: How do these methods compare with other feature selection techniques?
A: Sequential Forward Selection (SFS) and Sequential Backward Selection (SBS) offer a model-driven approach that captures feature interactions, unlike filter methods which assess features individually. However, SFS and SBS are slower than embedded methods like Lasso and may struggle with high-dimensional data due to their iterative and computationally intensive nature.

Was this article helpful?

Share:

facebooklinkedintwitter
copy

Was this article helpful?

Share:

facebooklinkedintwitter
copy

Content of this article

We're sorry to hear that something went wrong. What happened?
some-alt