RuleFit: Sparse Rule Ensembles
Deslize para mostrar o menu
RuleFit is an approach that aims to combine the predictive power of tree ensembles with the interpretability of linear models. The core idea behind RuleFit is to first extract a set of decision rules from an ensemble of decision trees, such as a random forest or a gradient boosting machine. Each rule is a simple logical statement—such as feature A > 5 and feature B ≤ 2—that defines a region in the feature space. Once these rules are extracted, each rule is treated as a binary feature: it is 1 if the rule is satisfied for a given instance, and 0 otherwise. These binary rule features, along with the original features, are then used in a sparse linear model, typically with L1 regularization (lasso), which encourages the model to select only the most important rules and features. The result is a model that is both accurate and highly interpretable, because its predictions can be explained in terms of a small set of human-readable rules.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879import numpy as np import pandas as pd from sklearn.datasets import load_breast_cancer from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LassoCV from sklearn.model_selection import train_test_split # Load a sample dataset X, y = load_breast_cancer(return_X_y=True) feature_names = load_breast_cancer().feature_names X = pd.DataFrame(X, columns=feature_names) # Train a tree ensemble rf = RandomForestClassifier(n_estimators=10, max_depth=3, random_state=42) rf.fit(X, y) # Extract rules from each tree in the ensemble def tree_to_rules(tree, feature_names): rules = [] tree_ = tree.tree_ feature = tree_.feature threshold = tree_.threshold def recurse(node, conditions): if tree_.feature[node] != -2: name = feature_names[feature[node]] thresh = threshold[node] left_cond = conditions + [f"({name} <= {thresh:.2f})"] recurse(tree_.children_left[node], left_cond) right_cond = conditions + [f"({name} > {thresh:.2f})"] recurse(tree_.children_right[node], right_cond) else: if conditions: rules.append(" and ".join(conditions)) recurse(0, []) return rules all_rules = [] for estimator in rf.estimators_: rules = tree_to_rules(estimator, feature_names) all_rules.extend(rules) all_rules = list(set(all_rules)) # Remove duplicates # Create binary rule features def rule_applies(rule, X_row): # Very simple parser for conjunctions of comparisons for cond in rule.split(" and "): cond = cond.strip("() ") if "<=" in cond: name, val = cond.split("<=") if not float(X_row[name.strip()]) <= float(val): return 0 elif ">" in cond: name, val = cond.split(">") if not float(X_row[name.strip()]) > float(val): return 0 return 1 rule_features = np.zeros((X.shape[0], len(all_rules))) for i, rule in enumerate(all_rules): rule_features[:, i] = X.apply(lambda row: rule_applies(rule, row), axis=1) rule_feature_names = [f"RULE_{i}" for i in range(len(all_rules))] rule_df = pd.DataFrame(rule_features, columns=rule_feature_names) # Concatenate original features and rule features X_extended = pd.concat([X, rule_df], axis=1) # Fit a sparse linear model (Lasso) for classification X_train, X_test, y_train, y_test = train_test_split(X_extended, y, test_size=0.3, random_state=42) lasso = LassoCV(cv=3, random_state=42, max_iter=10000) lasso.fit(X_train, y_train) # Display important rules (nonzero coefficients) coefs = pd.Series(lasso.coef_, index=X_extended.columns) selected = coefs[coefs != 0] print("Selected features and rules with nonzero coefficients:") print(selected)
The code above demonstrates a simplified RuleFit workflow. First, a random forest is trained, and rules are extracted from each tree as logical conjunctions of feature thresholds. Each rule is converted into a binary feature indicating whether it applies to a sample. These binary rule features are then combined with the original features, and a lasso regression model is trained to select a sparse subset of both. Because lasso encourages sparsity, only the most predictive rules and features are retained, making the resulting model easy to interpret: you can directly inspect which rules are used for prediction, and how much each contributes. This approach offers several advantages over standard tree ensembles:
- RuleFit models are usually more transparent, as they express decisions in terms of a short list of clear rules;
- The sparsity enforced by lasso means the model avoids using redundant or weak rules, further improving interpretability and potentially reducing overfitting.
1. Which of the following is a key advantage of RuleFit compared to standard tree ensembles?
2. Why does sparsity in the RuleFit model improve interpretability?
Obrigado pelo seu feedback!
Pergunte à IA
Pergunte à IA
Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo