Aprenda RuleFit: Sparse Rule Ensembles | Rule-Based Models in Practice

Deslize para mostrar o menu

RuleFit is an approach that aims to combine the predictive power of tree ensembles with the interpretability of linear models. The core idea behind RuleFit is to first extract a set of decision rules from an ensemble of decision trees, such as a random forest or a gradient boosting machine. Each rule is a simple logical statement—such as feature A > 5 and feature B ≤ 2—that defines a region in the feature space. Once these rules are extracted, each rule is treated as a binary feature: it is 1 if the rule is satisfied for a given instance, and 0 otherwise. These binary rule features, along with the original features, are then used in a sparse linear model, typically with L1 regularization (lasso), which encourages the model to select only the most important rules and features. The result is a model that is both accurate and highly interpretable, because its predictions can be explained in terms of a small set of human-readable rules.


              12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
            
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split

# Load a sample dataset
X, y = load_breast_cancer(return_X_y=True)
feature_names = load_breast_cancer().feature_names
X = pd.DataFrame(X, columns=feature_names)

# Train a tree ensemble
rf = RandomForestClassifier(n_estimators=10, max_depth=3, random_state=42)
rf.fit(X, y)

# Extract rules from each tree in the ensemble
def tree_to_rules(tree, feature_names):
    rules = []
    tree_ = tree.tree_
    feature = tree_.feature
    threshold = tree_.threshold

    def recurse(node, conditions):
        if tree_.feature[node] != -2:
            name = feature_names[feature[node]]
            thresh = threshold[node]
            left_cond = conditions + [f"({name} <= {thresh:.2f})"]
            recurse(tree_.children_left[node], left_cond)
            right_cond = conditions + [f"({name} > {thresh:.2f})"]
            recurse(tree_.children_right[node], right_cond)
        else:
            if conditions:
                rules.append(" and ".join(conditions))

    recurse(0, [])
    return rules

all_rules = []
for estimator in rf.estimators_:
    rules = tree_to_rules(estimator, feature_names)
    all_rules.extend(rules)
all_rules = list(set(all_rules))  # Remove duplicates

# Create binary rule features
def rule_applies(rule, X_row):
    # Very simple parser for conjunctions of comparisons
    for cond in rule.split(" and "):
        cond = cond.strip("() ")
        if "<=" in cond:
            name, val = cond.split("<=")
            if not float(X_row[name.strip()]) <= float(val):
                return 0
        elif ">" in cond:
            name, val = cond.split(">")
            if not float(X_row[name.strip()]) > float(val):
                return 0
    return 1

rule_features = np.zeros((X.shape[0], len(all_rules)))
for i, rule in enumerate(all_rules):
    rule_features[:, i] = X.apply(lambda row: rule_applies(rule, row), axis=1)

rule_feature_names = [f"RULE_{i}" for i in range(len(all_rules))]
rule_df = pd.DataFrame(rule_features, columns=rule_feature_names)

# Concatenate original features and rule features
X_extended = pd.concat([X, rule_df], axis=1)

# Fit a sparse linear model (Lasso) for classification
X_train, X_test, y_train, y_test = train_test_split(X_extended, y, test_size=0.3, random_state=42)
lasso = LassoCV(cv=3, random_state=42, max_iter=10000)
lasso.fit(X_train, y_train)

# Display important rules (nonzero coefficients)
coefs = pd.Series(lasso.coef_, index=X_extended.columns)
selected = coefs[coefs != 0]
print("Selected features and rules with nonzero coefficients:")
print(selected)

The code above demonstrates a simplified RuleFit workflow. First, a random forest is trained, and rules are extracted from each tree as logical conjunctions of feature thresholds. Each rule is converted into a binary feature indicating whether it applies to a sample. These binary rule features are then combined with the original features, and a lasso regression model is trained to select a sparse subset of both. Because lasso encourages sparsity, only the most predictive rules and features are retained, making the resulting model easy to interpret: you can directly inspect which rules are used for prediction, and how much each contributes. This approach offers several advantages over standard tree ensembles:

RuleFit models are usually more transparent, as they express decisions in terms of a short list of clear rules;
The sparsity enforced by lasso means the model avoids using redundant or weak rules, further improving interpretability and potentially reducing overfitting.

1. Which of the following is a key advantage of RuleFit compared to standard tree ensembles?

2. Why does sparsity in the RuleFit model improve interpretability?

Tudo estava claro?

Obrigado pelo seu feedback!

Seção 2. Capítulo 1

Pergunte à IA

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Seção 2. Capítulo 1