Lære Rule Lists Versus Rule Sets | Foundations of Rule-Based Machine Learning

When you build a rule-based machine learning system, you can organize your rules either as a rule list or a rule set. Understanding the difference between these two approaches is crucial for designing interpretable and effective models.

Definition

A rule list is an ordered collection of rules. The order in which rules appear matters: you evaluate each rule one by one, and as soon as a rule matches, you apply its prediction and stop checking further rules.

Think of a rule list like a prioritized checklist, such as airport security procedures: if a traveler has a diplomatic passport, they are processed immediately; if not, the next rule might check if they are a minor, and so on, until a rule applies.

Definition

A rule set, in contrast, is an unordered collection of rules. Here, every rule that matches the input is considered, and their predictions are typically combined, for example by voting or averaging.

This is similar to having several independent experts each provide their opinion on a case, and then making a decision based on all the opinions gathered.

For instance, in spam email detection, a rule list might check for the most obvious spam indicators first, stopping as soon as one is found, while a rule set might allow multiple spam indicators to contribute to the final decision, regardless of their order.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
            
# Sample dataset: simple email features
emails = [
    {"contains_link": True, "from_known_sender": False, "subject_caps": True},
    {"contains_link": False, "from_known_sender": True, "subject_caps": False},
    {"contains_link": True, "from_known_sender": True, "subject_caps": False},
    {"contains_link": False, "from_known_sender": False, "subject_caps": True},
]

# Rule list: ordered rules
def rule_list_predict(email):
    # Rule 1: If email contains a link and is not from a known sender, predict spam
    if email["contains_link"] and not email["from_known_sender"]:
        return "spam"
    # Rule 2: If subject is in all caps, predict spam
    if email["subject_caps"]:
        return "spam"
    # Rule 3: Otherwise, predict not spam
    return "not spam"

# Rule set: unordered rules, majority vote
def rule_set_predict(email):
    votes = []
    # Rule A: If email contains a link, vote spam
    if email["contains_link"]:
        votes.append("spam")
    # Rule B: If from a known sender, vote not spam
    if email["from_known_sender"]:
        votes.append("not spam")
    # Rule C: If subject is in all caps, vote spam
    if email["subject_caps"]:
        votes.append("spam")
    # Majority vote
    if votes.count("spam") > votes.count("not spam"):
        return "spam"
    elif votes.count("not spam") > votes.count("spam"):
        return "not spam"
    else:
        return "not spam"  # Default if tie

# Apply both models to dataset
print("Rule List Predictions:")
for email in emails:
    print(rule_list_predict(email))

print("\nRule Set Predictions:")
for email in emails:
    print(rule_set_predict(email))

The code above illustrates how the same set of logical rules can yield different results depending on whether they are used as a rule list or a rule set. In the rule list, the order of rules determines which prediction is made: only the first matching rule applies, and no further rules are checked. This can make rule lists highly interpretable and efficient for scenarios where some conditions are more important or should override others.

On the other hand, the rule set approach evaluates all rules without regard to order, combining their outputs (in this case, by majority vote). This can offer greater flexibility, especially when multiple factors should contribute to the decision, but it may also be less transparent if many rules fire simultaneously.

Choosing between a rule list and a rule set depends on your use case. If you need clear, hierarchical decision-making—such as triage systems, compliance checks, or any process where priorities matter—a rule list is often best. If your task benefits from aggregating multiple signals, such as ensemble-based predictions or collaborative filtering, a rule set may be more effective.

Var alt klart?

Tak for dine kommentarer!

Sektion 1. Kapitel 3

Spørg AI

Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat

Awesome!

Completion rate improved to 6.25

Stryg for at vise menuen

Definition

This is similar to having several independent experts each provide their opinion on a case, and then making a decision based on all the opinions gathered.


              1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
            
# Sample dataset: simple email features
emails = [
    {"contains_link": True, "from_known_sender": False, "subject_caps": True},
    {"contains_link": False, "from_known_sender": True, "subject_caps": False},
    {"contains_link": True, "from_known_sender": True, "subject_caps": False},
    {"contains_link": False, "from_known_sender": False, "subject_caps": True},
]

# Rule list: ordered rules
def rule_list_predict(email):
    # Rule 1: If email contains a link and is not from a known sender, predict spam
    if email["contains_link"] and not email["from_known_sender"]:
        return "spam"
    # Rule 2: If subject is in all caps, predict spam
    if email["subject_caps"]:
        return "spam"
    # Rule 3: Otherwise, predict not spam
    return "not spam"

# Rule set: unordered rules, majority vote
def rule_set_predict(email):
    votes = []
    # Rule A: If email contains a link, vote spam
    if email["contains_link"]:
        votes.append("spam")
    # Rule B: If from a known sender, vote not spam
    if email["from_known_sender"]:
        votes.append("not spam")
    # Rule C: If subject is in all caps, vote spam
    if email["subject_caps"]:
        votes.append("spam")
    # Majority vote
    if votes.count("spam") > votes.count("not spam"):
        return "spam"
    elif votes.count("not spam") > votes.count("spam"):
        return "not spam"
    else:
        return "not spam"  # Default if tie

# Apply both models to dataset
print("Rule List Predictions:")
for email in emails:
    print(rule_list_predict(email))

print("\nRule Set Predictions:")
for email in emails:
    print(rule_set_predict(email))

Var alt klart?

Tak for dine kommentarer!

Sektion 1. Kapitel 3