Lære Combining Rules with Standard Models | Hybrid and Applied Rule-Based Forecasting

Hybrid modeling combines the strengths of rule-based systems with those of standard machine learning models such as boosting or linear models. In tabular data scenarios, rule-based models can capture human-understandable patterns and domain knowledge, while boosting or linear models excel at fitting complex relationships and optimizing predictive accuracy. By integrating these approaches, you can create models that are both interpretable and effective. The most common hybrid strategy involves generating features from decision rules—such as those extracted from a rule mining algorithm or a decision tree—and using these rule-based features as additional inputs to a standard model. This allows the model to leverage both explicit logical patterns and continuous statistical relationships in the data.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
            
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# Example dataset
X = pd.DataFrame({
    "age": [25, 45, 35, 50, 23, 40, 60, 30],
    "income": [50000, 80000, 60000, 120000, 40000, 70000, 150000, 52000]
})
y = np.array([0, 1, 0, 1, 0, 1, 1, 0])

# Step 1: Fit a decision tree to generate rules
tree = DecisionTreeClassifier(max_depth=2, random_state=42)
tree.fit(X, y)

# Step 2: Create rule-based features
rule_features = tree.decision_path(X).toarray()
rule_feature_names = [f"rule_{i}" for i in range(rule_features.shape[1])]
rule_df = pd.DataFrame(rule_features, columns=rule_feature_names)

# Step 3: Concatenate original features with rule-based features
X_hybrid = pd.concat([X, rule_df], axis=1)

# Scale continuous features (helps logistic regression!)
scaler = StandardScaler()
X_scaled = X_hybrid.copy()
X_scaled[["age", "income"]] = scaler.fit_transform(X_scaled[["age", "income"]])

# ---- Evaluation with train/test split ----
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.25, random_state=42
)

clf = LogisticRegression(max_iter=2000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

# ---- Cross-validation ----
scores = cross_val_score(LogisticRegression(max_iter=2000), X_scaled, y, cv=4)

print("4-Fold Cross-Validation")
print("Scores:", scores)
print("Mean accuracy:", scores.mean())

Hybrid models like the one shown above offer notable advantages in both interpretability and performance. By incorporating rule-based features (such as decision paths from a tree) into a linear model, you retain the transparency of explicit rules while benefiting from the predictive power of statistical models. The linear model's coefficients can help you understand the influence of both the original and rule-derived features. This approach is especially useful when domain knowledge is encoded as rules, but you also want to capture subtle patterns detectable by the linear model. As a result, hybrid models can achieve better generalization and provide clearer explanations compared to using either method in isolation.

1. Which of the following is a key benefit of hybrid rule-based models?

2. When should you consider using a hybrid approach that combines rule-based and standard models?

Var alt klart?

Tak for dine kommentarer!

Sektion 3. Kapitel 1

Spørg AI

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Suggested prompts:

Can you explain how the rule-based features are generated from the decision tree?

How do I interpret the coefficients of the hybrid model?

What are some practical scenarios where hybrid modeling is especially beneficial?

Awesome!

Completion rate improved to 6.25

Stryg for at vise menuen


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
            
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# Example dataset
X = pd.DataFrame({
    "age": [25, 45, 35, 50, 23, 40, 60, 30],
    "income": [50000, 80000, 60000, 120000, 40000, 70000, 150000, 52000]
})
y = np.array([0, 1, 0, 1, 0, 1, 1, 0])

# Step 1: Fit a decision tree to generate rules
tree = DecisionTreeClassifier(max_depth=2, random_state=42)
tree.fit(X, y)

# Step 2: Create rule-based features
rule_features = tree.decision_path(X).toarray()
rule_feature_names = [f"rule_{i}" for i in range(rule_features.shape[1])]
rule_df = pd.DataFrame(rule_features, columns=rule_feature_names)

# Step 3: Concatenate original features with rule-based features
X_hybrid = pd.concat([X, rule_df], axis=1)

# Scale continuous features (helps logistic regression!)
scaler = StandardScaler()
X_scaled = X_hybrid.copy()
X_scaled[["age", "income"]] = scaler.fit_transform(X_scaled[["age", "income"]])

# ---- Evaluation with train/test split ----
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.25, random_state=42
)

clf = LogisticRegression(max_iter=2000)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

# ---- Cross-validation ----
scores = cross_val_score(LogisticRegression(max_iter=2000), X_scaled, y, cv=4)

print("4-Fold Cross-Validation")
print("Scores:", scores)
print("Mean accuracy:", scores.mean())

1. Which of the following is a key benefit of hybrid rule-based models?

2. When should you consider using a hybrid approach that combines rule-based and standard models?

Var alt klart?

Tak for dine kommentarer!

Sektion 3. Kapitel 1