Aprende RuleFit: Sparse Rule Ensembles | Rule-Based Models in Practice

RuleFit is an approach that aims to combine the predictive power of tree ensembles with the interpretability of linear models. The core idea behind RuleFit is to first extract a set of decision rules from an ensemble of decision trees, such as a random forest or a gradient boosting machine. Each rule is a simple logical statement—such as feature A > 5 and feature B ≤ 2—that defines a region in the feature space. Once these rules are extracted, each rule is treated as a binary feature: it is 1 if the rule is satisfied for a given instance, and 0 otherwise. These binary rule features, along with the original features, are then used in a sparse linear model, typically with L1 regularization (lasso), which encourages the model to select only the most important rules and features. The result is a model that is both accurate and highly interpretable, because its predictions can be explained in terms of a small set of human-readable rules.


              12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
            
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split

# Load a sample dataset
X, y = load_breast_cancer(return_X_y=True)
feature_names = load_breast_cancer().feature_names
X = pd.DataFrame(X, columns=feature_names)

# Train a tree ensemble
rf = RandomForestClassifier(n_estimators=10, max_depth=3, random_state=42)
rf.fit(X, y)

# Extract rules from each tree in the ensemble
def tree_to_rules(tree, feature_names):
    rules = []
    tree_ = tree.tree_
    feature = tree_.feature
    threshold = tree_.threshold

    def recurse(node, conditions):
        if tree_.feature[node] != -2:
            name = feature_names[feature[node]]
            thresh = threshold[node]
            left_cond = conditions + [f"({name} <= {thresh:.2f})"]
            recurse(tree_.children_left[node], left_cond)
            right_cond = conditions + [f"({name} > {thresh:.2f})"]
            recurse(tree_.children_right[node], right_cond)
        else:
            if conditions:
                rules.append(" and ".join(conditions))

    recurse(0, [])
    return rules

all_rules = []
for estimator in rf.estimators_:
    rules = tree_to_rules(estimator, feature_names)
    all_rules.extend(rules)
all_rules = list(set(all_rules))  # Remove duplicates

# Create binary rule features
def rule_applies(rule, X_row):
    # Very simple parser for conjunctions of comparisons
    for cond in rule.split(" and "):
        cond = cond.strip("() ")
        if "<=" in cond:
            name, val = cond.split("<=")
            if not float(X_row[name.strip()]) <= float(val):
                return 0
        elif ">" in cond:
            name, val = cond.split(">")
            if not float(X_row[name.strip()]) > float(val):
                return 0
    return 1

rule_features = np.zeros((X.shape[0], len(all_rules)))
for i, rule in enumerate(all_rules):
    rule_features[:, i] = X.apply(lambda row: rule_applies(rule, row), axis=1)

rule_feature_names = [f"RULE_{i}" for i in range(len(all_rules))]
rule_df = pd.DataFrame(rule_features, columns=rule_feature_names)

# Concatenate original features and rule features
X_extended = pd.concat([X, rule_df], axis=1)

# Fit a sparse linear model (Lasso) for classification
X_train, X_test, y_train, y_test = train_test_split(X_extended, y, test_size=0.3, random_state=42)
lasso = LassoCV(cv=3, random_state=42, max_iter=10000)
lasso.fit(X_train, y_train)

# Display important rules (nonzero coefficients)
coefs = pd.Series(lasso.coef_, index=X_extended.columns)
selected = coefs[coefs != 0]
print("Selected features and rules with nonzero coefficients:")
print(selected)

The code above demonstrates a simplified RuleFit workflow. First, a random forest is trained, and rules are extracted from each tree as logical conjunctions of feature thresholds. Each rule is converted into a binary feature indicating whether it applies to a sample. These binary rule features are then combined with the original features, and a lasso regression model is trained to select a sparse subset of both. Because lasso encourages sparsity, only the most predictive rules and features are retained, making the resulting model easy to interpret: you can directly inspect which rules are used for prediction, and how much each contributes. This approach offers several advantages over standard tree ensembles:

RuleFit models are usually more transparent, as they express decisions in terms of a short list of clear rules;
The sparsity enforced by lasso means the model avoids using redundant or weak rules, further improving interpretability and potentially reducing overfitting.

1. Which of the following is a key advantage of RuleFit compared to standard tree ensembles?

2. Why does sparsity in the RuleFit model improve interpretability?

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 1

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Awesome!

Completion rate improved to 6.25

Desliza para mostrar el menú


              12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879
            
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split

# Load a sample dataset
X, y = load_breast_cancer(return_X_y=True)
feature_names = load_breast_cancer().feature_names
X = pd.DataFrame(X, columns=feature_names)

# Train a tree ensemble
rf = RandomForestClassifier(n_estimators=10, max_depth=3, random_state=42)
rf.fit(X, y)

# Extract rules from each tree in the ensemble
def tree_to_rules(tree, feature_names):
    rules = []
    tree_ = tree.tree_
    feature = tree_.feature
    threshold = tree_.threshold

    def recurse(node, conditions):
        if tree_.feature[node] != -2:
            name = feature_names[feature[node]]
            thresh = threshold[node]
            left_cond = conditions + [f"({name} <= {thresh:.2f})"]
            recurse(tree_.children_left[node], left_cond)
            right_cond = conditions + [f"({name} > {thresh:.2f})"]
            recurse(tree_.children_right[node], right_cond)
        else:
            if conditions:
                rules.append(" and ".join(conditions))

    recurse(0, [])
    return rules

all_rules = []
for estimator in rf.estimators_:
    rules = tree_to_rules(estimator, feature_names)
    all_rules.extend(rules)
all_rules = list(set(all_rules))  # Remove duplicates

# Create binary rule features
def rule_applies(rule, X_row):
    # Very simple parser for conjunctions of comparisons
    for cond in rule.split(" and "):
        cond = cond.strip("() ")
        if "<=" in cond:
            name, val = cond.split("<=")
            if not float(X_row[name.strip()]) <= float(val):
                return 0
        elif ">" in cond:
            name, val = cond.split(">")
            if not float(X_row[name.strip()]) > float(val):
                return 0
    return 1

rule_features = np.zeros((X.shape[0], len(all_rules)))
for i, rule in enumerate(all_rules):
    rule_features[:, i] = X.apply(lambda row: rule_applies(rule, row), axis=1)

rule_feature_names = [f"RULE_{i}" for i in range(len(all_rules))]
rule_df = pd.DataFrame(rule_features, columns=rule_feature_names)

# Concatenate original features and rule features
X_extended = pd.concat([X, rule_df], axis=1)

# Fit a sparse linear model (Lasso) for classification
X_train, X_test, y_train, y_test = train_test_split(X_extended, y, test_size=0.3, random_state=42)
lasso = LassoCV(cv=3, random_state=42, max_iter=10000)
lasso.fit(X_train, y_train)

# Display important rules (nonzero coefficients)
coefs = pd.Series(lasso.coef_, index=X_extended.columns)
selected = coefs[coefs != 0]
print("Selected features and rules with nonzero coefficients:")
print(selected)

RuleFit models are usually more transparent, as they express decisions in terms of a short list of clear rules;
The sparsity enforced by lasso means the model avoids using redundant or weak rules, further improving interpretability and potentially reducing overfitting.

1. Which of the following is a key advantage of RuleFit compared to standard tree ensembles?

2. Why does sparsity in the RuleFit model improve interpretability?

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 2. Capítulo 1