How to Choose Minimum Support/Confidence Values

Choosing appropriate minimum support and confidence values is crucial when mining association rules from transactional datasets. The support and confidence thresholds determine the strength and relevance of the discovered rules. Selecting these values requires a balance between capturing meaningful associations and avoiding an overwhelming number of trivial rules.

Factors influencing minimum threshold selection

Dataset Size: Large datasets may require lower support thresholds to capture meaningful associations due to the increased variability in item occurrences;
Data Sparsity: Sparse datasets, where items have low occurrence frequencies, may necessitate lower support thresholds to uncover significant associations;
Domain Knowledge: Understanding the domain and the context of the dataset can guide the selection of appropriate thresholds. Prior knowledge about item interactions can inform the choice of support and confidence values;
Objective of Analysis: The purpose of association rule mining influences the choice of thresholds. For exploratory analysis, higher support thresholds may be suitable to identify prominent associations, while lower thresholds may be preferred for comprehensive pattern discovery.

Example

Now you can conduct a simple experiment: change the min_support and min_confidence values in the code sample below and observe how your changes influence the results.


              123456789101112131415161718192021222324252627282930313233343536373839404142434445
            
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Create a sample transaction dataset
transactions = [
    ['milk', 'bread', 'eggs'],
    ['bread', 'butter', 'jam', 'eggs'],
    ['milk', 'bread', 'butter', 'jam'],
    ['milk', 'eggs', 'cheese'],
    ['bread', 'eggs', 'butter', 'jam', 'honey'],
    ['bread', 'eggs', 'jam', 'yogurt', 'fruit'],
    ['bread', 'milk', 'eggs', 'butter', 'jam', 'cheese', 'yogurt'],
    ['milk', 'cheese', 'jam', 'honey', 'fruit'],
    ['bread', 'milk', 'eggs', 'butter', 'jam', 'honey']
]

# Define minimum support and confidence thresholds
min_support = 0.2
min_confidence = 0.7

# Initialize and fit `TransactionEncoder`
encoder = TransactionEncoder()
encoder.fit(transactions)

# Transform transactions using the encoder
one_hot_encoded = encoder.transform(transactions)

# Convert one-hot encoded array to DataFrame
df = pd.DataFrame(one_hot_encoded, columns=encoder.columns_)

# Find frequent itemsets using the Apriori algorithm
frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True)

# Print frequent itemsets
print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)

# Print association rules with antecedent -> consequent format and confidence
print("\nAssociation Rules:")
for index, row in rules.iterrows():
    print("Rule: {} => {} with Confidence: {:.2f}".format(list(row['antecedents']), list(row['consequents']), row['confidence']))

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 7

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Association Rule Mining

1. Introduction to Association Rule Mining

Definition and Overview of ARM Frequent Itemsets and Association rules Support, Confidence, and Lift Measures Challenge: Metrics Calculation Apriori Principle and Its Significance

2. Mining Frequent Itemsets

3. Additional Applications of ARM

Exploratory Data Analysis Recommendation Systems Other Applications