Aprende Rule Quality Metrics | Foundations of Rule-Based Machine Learning

Evaluating the effectiveness of rules is essential for building reliable rule-based machine learning systems. Four key metrics are used to assess the quality of a rule: support, confidence, lift, and coverage. Each metric provides a different perspective on how valuable, reliable, and informative a rule is within your data.

Support measures how frequently the rule's conditions and outcome occur together in the dataset. It tells you how common the rule is. Confidence reflects the probability that the outcome happens when the rule's conditions are met, indicating the rule's reliability. Lift compares the rule's confidence to the baseline probability of the outcome, showing whether the rule uncovers real associations or just random coincidence. Coverage indicates how often the rule's conditions occur in the data, regardless of the outcome, helping you understand the rule's applicability.

Understanding these metrics helps you select rules that are both meaningful and actionable for your machine learning models.


              12345678910111213141516171819202122232425262728293031
            
import pandas as pd

# Sample dataset
data = {
    "Age": [25, 30, 45, 35, 22, 40, 50, 23],
    "Buys_Computer": ["No", "Yes", "Yes", "Yes", "No", "Yes", "No", "No"]
}
df = pd.DataFrame(data)

# Define the rule: IF Age > 30 THEN Buys_Computer = "Yes"
condition = df["Age"] > 30
outcome = df["Buys_Computer"] == "Yes"
both = condition & outcome

# Support: Fraction where both condition and outcome are true
support = both.sum() / len(df)

# Confidence: Fraction of times outcome is true when condition is true
confidence = both.sum() / condition.sum()

# Coverage: Fraction of data where condition is true
coverage = condition.sum() / len(df)

# Lift: Confidence divided by baseline probability of outcome
baseline = outcome.sum() / len(df)
lift = confidence / baseline

print(f"Support: {support:.2f}")
print(f"Confidence: {confidence:.2f}")
print(f"Coverage: {coverage:.2f}")
print(f"Lift: {lift:.2f}")

To understand how these metrics are calculated, consider the rule: IF Age > 30 THEN Buys_Computer = "Yes". First, you identify which rows in the data satisfy the condition (Age > 30), and which rows have the outcome (Buys_Computer = "Yes").

Support is calculated by counting how many rows satisfy both the condition and the outcome, then dividing by the total number of rows. This shows how prevalent the rule is in the dataset.
Confidence is found by dividing the number of rows where both the condition and the outcome are true by the number of rows where only the condition is true. This tells you how reliable the rule is when applied.
Coverage is the proportion of rows where the condition is true, regardless of the outcome. This helps you see how broadly the rule applies in your data.
Lift is the confidence divided by the baseline probability of the outcome (how often Buys_Computer = "Yes" occurs overall). A lift greater than 1 means the rule is better at predicting the outcome than random chance.

By interpreting these values, you can decide whether the rule is both significant and useful for your machine learning model.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 2

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Awesome!

Completion rate improved to 6.25

Desliza para mostrar el menú

Understanding these metrics helps you select rules that are both meaningful and actionable for your machine learning models.


              12345678910111213141516171819202122232425262728293031
            
import pandas as pd

# Sample dataset
data = {
    "Age": [25, 30, 45, 35, 22, 40, 50, 23],
    "Buys_Computer": ["No", "Yes", "Yes", "Yes", "No", "Yes", "No", "No"]
}
df = pd.DataFrame(data)

# Define the rule: IF Age > 30 THEN Buys_Computer = "Yes"
condition = df["Age"] > 30
outcome = df["Buys_Computer"] == "Yes"
both = condition & outcome

# Support: Fraction where both condition and outcome are true
support = both.sum() / len(df)

# Confidence: Fraction of times outcome is true when condition is true
confidence = both.sum() / condition.sum()

# Coverage: Fraction of data where condition is true
coverage = condition.sum() / len(df)

# Lift: Confidence divided by baseline probability of outcome
baseline = outcome.sum() / len(df)
lift = confidence / baseline

print(f"Support: {support:.2f}")
print(f"Confidence: {confidence:.2f}")
print(f"Coverage: {coverage:.2f}")
print(f"Lift: {lift:.2f}")

Support is calculated by counting how many rows satisfy both the condition and the outcome, then dividing by the total number of rows. This shows how prevalent the rule is in the dataset.
Confidence is found by dividing the number of rows where both the condition and the outcome are true by the number of rows where only the condition is true. This tells you how reliable the rule is when applied.
Coverage is the proportion of rows where the condition is true, regardless of the outcome. This helps you see how broadly the rule applies in your data.
Lift is the confidence divided by the baseline probability of the outcome (how often Buys_Computer = "Yes" occurs overall). A lift greater than 1 means the rule is better at predicting the outcome than random chance.

By interpreting these values, you can decide whether the rule is both significant and useful for your machine learning model.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 2