RuleFit: Sparse Rule Ensembles
RuleFit is an approach that aims to combine the predictive power of tree ensembles with the interpretability of linear models. The core idea behind RuleFit is to first extract a set of decision rules from an ensemble of decision trees, such as a random forest or a gradient boosting machine. Each rule is a simple logical statement—such as feature A > 5 and feature B ≤ 2—that defines a region in the feature space. Once these rules are extracted, each rule is treated as a binary feature: it is 1 if the rule is satisfied for a given instance, and 0 otherwise. These binary rule features, along with the original features, are then used in a sparse linear model, typically with L1 regularization (lasso), which encourages the model to select only the most important rules and features. The result is a model that is both accurate and highly interpretable, because its predictions can be explained in terms of a small set of human-readable rules.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879import numpy as np import pandas as pd from sklearn.datasets import load_breast_cancer from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LassoCV from sklearn.model_selection import train_test_split # Load a sample dataset X, y = load_breast_cancer(return_X_y=True) feature_names = load_breast_cancer().feature_names X = pd.DataFrame(X, columns=feature_names) # Train a tree ensemble rf = RandomForestClassifier(n_estimators=10, max_depth=3, random_state=42) rf.fit(X, y) # Extract rules from each tree in the ensemble def tree_to_rules(tree, feature_names): rules = [] tree_ = tree.tree_ feature = tree_.feature threshold = tree_.threshold def recurse(node, conditions): if tree_.feature[node] != -2: name = feature_names[feature[node]] thresh = threshold[node] left_cond = conditions + [f"({name} <= {thresh:.2f})"] recurse(tree_.children_left[node], left_cond) right_cond = conditions + [f"({name} > {thresh:.2f})"] recurse(tree_.children_right[node], right_cond) else: if conditions: rules.append(" and ".join(conditions)) recurse(0, []) return rules all_rules = [] for estimator in rf.estimators_: rules = tree_to_rules(estimator, feature_names) all_rules.extend(rules) all_rules = list(set(all_rules)) # Remove duplicates # Create binary rule features def rule_applies(rule, X_row): # Very simple parser for conjunctions of comparisons for cond in rule.split(" and "): cond = cond.strip("() ") if "<=" in cond: name, val = cond.split("<=") if not float(X_row[name.strip()]) <= float(val): return 0 elif ">" in cond: name, val = cond.split(">") if not float(X_row[name.strip()]) > float(val): return 0 return 1 rule_features = np.zeros((X.shape[0], len(all_rules))) for i, rule in enumerate(all_rules): rule_features[:, i] = X.apply(lambda row: rule_applies(rule, row), axis=1) rule_feature_names = [f"RULE_{i}" for i in range(len(all_rules))] rule_df = pd.DataFrame(rule_features, columns=rule_feature_names) # Concatenate original features and rule features X_extended = pd.concat([X, rule_df], axis=1) # Fit a sparse linear model (Lasso) for classification X_train, X_test, y_train, y_test = train_test_split(X_extended, y, test_size=0.3, random_state=42) lasso = LassoCV(cv=3, random_state=42, max_iter=10000) lasso.fit(X_train, y_train) # Display important rules (nonzero coefficients) coefs = pd.Series(lasso.coef_, index=X_extended.columns) selected = coefs[coefs != 0] print("Selected features and rules with nonzero coefficients:") print(selected)
The code above demonstrates a simplified RuleFit workflow. First, a random forest is trained, and rules are extracted from each tree as logical conjunctions of feature thresholds. Each rule is converted into a binary feature indicating whether it applies to a sample. These binary rule features are then combined with the original features, and a lasso regression model is trained to select a sparse subset of both. Because lasso encourages sparsity, only the most predictive rules and features are retained, making the resulting model easy to interpret: you can directly inspect which rules are used for prediction, and how much each contributes. This approach offers several advantages over standard tree ensembles:
- RuleFit models are usually more transparent, as they express decisions in terms of a short list of clear rules;
- The sparsity enforced by lasso means the model avoids using redundant or weak rules, further improving interpretability and potentially reducing overfitting.
1. Which of the following is a key advantage of RuleFit compared to standard tree ensembles?
2. Why does sparsity in the RuleFit model improve interpretability?
¡Gracias por tus comentarios!
Pregunte a AI
Pregunte a AI
Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla
Awesome!
Completion rate improved to 6.25
RuleFit: Sparse Rule Ensembles
Desliza para mostrar el menú
RuleFit is an approach that aims to combine the predictive power of tree ensembles with the interpretability of linear models. The core idea behind RuleFit is to first extract a set of decision rules from an ensemble of decision trees, such as a random forest or a gradient boosting machine. Each rule is a simple logical statement—such as feature A > 5 and feature B ≤ 2—that defines a region in the feature space. Once these rules are extracted, each rule is treated as a binary feature: it is 1 if the rule is satisfied for a given instance, and 0 otherwise. These binary rule features, along with the original features, are then used in a sparse linear model, typically with L1 regularization (lasso), which encourages the model to select only the most important rules and features. The result is a model that is both accurate and highly interpretable, because its predictions can be explained in terms of a small set of human-readable rules.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879import numpy as np import pandas as pd from sklearn.datasets import load_breast_cancer from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LassoCV from sklearn.model_selection import train_test_split # Load a sample dataset X, y = load_breast_cancer(return_X_y=True) feature_names = load_breast_cancer().feature_names X = pd.DataFrame(X, columns=feature_names) # Train a tree ensemble rf = RandomForestClassifier(n_estimators=10, max_depth=3, random_state=42) rf.fit(X, y) # Extract rules from each tree in the ensemble def tree_to_rules(tree, feature_names): rules = [] tree_ = tree.tree_ feature = tree_.feature threshold = tree_.threshold def recurse(node, conditions): if tree_.feature[node] != -2: name = feature_names[feature[node]] thresh = threshold[node] left_cond = conditions + [f"({name} <= {thresh:.2f})"] recurse(tree_.children_left[node], left_cond) right_cond = conditions + [f"({name} > {thresh:.2f})"] recurse(tree_.children_right[node], right_cond) else: if conditions: rules.append(" and ".join(conditions)) recurse(0, []) return rules all_rules = [] for estimator in rf.estimators_: rules = tree_to_rules(estimator, feature_names) all_rules.extend(rules) all_rules = list(set(all_rules)) # Remove duplicates # Create binary rule features def rule_applies(rule, X_row): # Very simple parser for conjunctions of comparisons for cond in rule.split(" and "): cond = cond.strip("() ") if "<=" in cond: name, val = cond.split("<=") if not float(X_row[name.strip()]) <= float(val): return 0 elif ">" in cond: name, val = cond.split(">") if not float(X_row[name.strip()]) > float(val): return 0 return 1 rule_features = np.zeros((X.shape[0], len(all_rules))) for i, rule in enumerate(all_rules): rule_features[:, i] = X.apply(lambda row: rule_applies(rule, row), axis=1) rule_feature_names = [f"RULE_{i}" for i in range(len(all_rules))] rule_df = pd.DataFrame(rule_features, columns=rule_feature_names) # Concatenate original features and rule features X_extended = pd.concat([X, rule_df], axis=1) # Fit a sparse linear model (Lasso) for classification X_train, X_test, y_train, y_test = train_test_split(X_extended, y, test_size=0.3, random_state=42) lasso = LassoCV(cv=3, random_state=42, max_iter=10000) lasso.fit(X_train, y_train) # Display important rules (nonzero coefficients) coefs = pd.Series(lasso.coef_, index=X_extended.columns) selected = coefs[coefs != 0] print("Selected features and rules with nonzero coefficients:") print(selected)
The code above demonstrates a simplified RuleFit workflow. First, a random forest is trained, and rules are extracted from each tree as logical conjunctions of feature thresholds. Each rule is converted into a binary feature indicating whether it applies to a sample. These binary rule features are then combined with the original features, and a lasso regression model is trained to select a sparse subset of both. Because lasso encourages sparsity, only the most predictive rules and features are retained, making the resulting model easy to interpret: you can directly inspect which rules are used for prediction, and how much each contributes. This approach offers several advantages over standard tree ensembles:
- RuleFit models are usually more transparent, as they express decisions in terms of a short list of clear rules;
- The sparsity enforced by lasso means the model avoids using redundant or weak rules, further improving interpretability and potentially reducing overfitting.
1. Which of the following is a key advantage of RuleFit compared to standard tree ensembles?
2. Why does sparsity in the RuleFit model improve interpretability?
¡Gracias por tus comentarios!