Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Rule Pruning and Redundancy Removal | Foundations of Rule-Based Machine Learning
Rule-Based Machine Learning Systems

bookRule Pruning and Redundancy Removal

When building rule-based machine learning models, you often generate a large set of rules to cover as many patterns in the data as possible. However, not all rules are equally valuable. Some rules may overlap, repeat the same logic, or capture noise rather than meaningful patterns. This can lead to models that are unnecessarily complex, harder to interpret, and more likely to overfit the training data. Pruning and redundancy removal are essential steps to address these issues. For example, consider these two rules:

  • If age > 30 and income > 50K then label = "yes";
  • If income > 50K and age > 30 then label = "yes";

Both rules express the same logic, so one is redundant. Keeping both adds no value but increases complexity. Similarly, rules that rarely apply or have low predictive accuracy may not help generalization and can also be pruned.

1234567891011121314151617181920212223242526272829
# List of candidate rules as (rule, coverage, accuracy) candidate_rules = [ ("if age > 30 and income > 50K then label = 'yes'", 120, 0.95), ("if income > 50K and age > 30 then label = 'yes'", 120, 0.95), # Redundant ("if age <= 30 then label = 'no'", 80, 0.80), ("if income < 20K then label = 'no'", 10, 0.30), # Low quality ("if city == 'NY' then label = 'yes'", 15, 0.60) ] def is_redundant(rule, rules_seen): normalized = rule.lower().replace(" ", "") return normalized in rules_seen def prune_rules(rules, min_coverage=20, min_accuracy=0.7): pruned = [] rules_seen = set() for rule, coverage, accuracy in rules: if coverage < min_coverage or accuracy < min_accuracy: continue # Remove low-quality rules # Remove redundant rules (same logic, different order) normalized = rule.lower().replace(" ", "") if normalized not in rules_seen: pruned.append((rule, coverage, accuracy)) rules_seen.add(normalized) return pruned pruned_rules = prune_rules(candidate_rules) for rule in pruned_rules: print(rule[0])
copy

The pruning logic in the code above works in two main ways. First, it filters out rules that have low coverage or low accuracy, ensuring that only high-quality rules remain. Second, it checks for redundancy by normalizing the rule strings (removing spaces and making them lowercase) so that logically identical rules written in different ways are recognized as duplicates. By keeping only unique, high-quality rules, the resulting model is simpler and easier to interpret. This also helps the model generalize better to new data, as it avoids overfitting to noise or redundant patterns.

question mark

Which of the following is a primary benefit of pruning rules in a rule-based machine learning system?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 4

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Suggested prompts:

Can you explain how the normalization step detects redundant rules?

What would happen if the rules had different variable orders or extra spaces?

How can I adjust the pruning thresholds for coverage and accuracy?

Awesome!

Completion rate improved to 6.25

bookRule Pruning and Redundancy Removal

Desliza para mostrar el menú

When building rule-based machine learning models, you often generate a large set of rules to cover as many patterns in the data as possible. However, not all rules are equally valuable. Some rules may overlap, repeat the same logic, or capture noise rather than meaningful patterns. This can lead to models that are unnecessarily complex, harder to interpret, and more likely to overfit the training data. Pruning and redundancy removal are essential steps to address these issues. For example, consider these two rules:

  • If age > 30 and income > 50K then label = "yes";
  • If income > 50K and age > 30 then label = "yes";

Both rules express the same logic, so one is redundant. Keeping both adds no value but increases complexity. Similarly, rules that rarely apply or have low predictive accuracy may not help generalization and can also be pruned.

1234567891011121314151617181920212223242526272829
# List of candidate rules as (rule, coverage, accuracy) candidate_rules = [ ("if age > 30 and income > 50K then label = 'yes'", 120, 0.95), ("if income > 50K and age > 30 then label = 'yes'", 120, 0.95), # Redundant ("if age <= 30 then label = 'no'", 80, 0.80), ("if income < 20K then label = 'no'", 10, 0.30), # Low quality ("if city == 'NY' then label = 'yes'", 15, 0.60) ] def is_redundant(rule, rules_seen): normalized = rule.lower().replace(" ", "") return normalized in rules_seen def prune_rules(rules, min_coverage=20, min_accuracy=0.7): pruned = [] rules_seen = set() for rule, coverage, accuracy in rules: if coverage < min_coverage or accuracy < min_accuracy: continue # Remove low-quality rules # Remove redundant rules (same logic, different order) normalized = rule.lower().replace(" ", "") if normalized not in rules_seen: pruned.append((rule, coverage, accuracy)) rules_seen.add(normalized) return pruned pruned_rules = prune_rules(candidate_rules) for rule in pruned_rules: print(rule[0])
copy

The pruning logic in the code above works in two main ways. First, it filters out rules that have low coverage or low accuracy, ensuring that only high-quality rules remain. Second, it checks for redundancy by normalizing the rule strings (removing spaces and making them lowercase) so that logically identical rules written in different ways are recognized as duplicates. By keeping only unique, high-quality rules, the resulting model is simpler and easier to interpret. This also helps the model generalize better to new data, as it avoids overfitting to noise or redundant patterns.

question mark

Which of the following is a primary benefit of pruning rules in a rule-based machine learning system?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 4
some-alt