Course Content
Association Rule Mining
Association Rule Mining
How to Choose Minimum Support/Confidence Values
Choosing appropriate minimum support and confidence values is crucial when mining association rules from transactional datasets. The support and confidence thresholds determine the strength and relevance of the discovered rules. Selecting these values requires a balance between capturing meaningful associations and avoiding an overwhelming number of trivial rules.
Factors influencing minimum threshold selection
-
Dataset Size: Large datasets may require lower support thresholds to capture meaningful associations due to the increased variability in item occurrences;
-
Data Sparsity: Sparse datasets, where items have low occurrence frequencies, may necessitate lower support thresholds to uncover significant associations;
-
Domain Knowledge: Understanding the domain and the context of the dataset can guide the selection of appropriate thresholds. Prior knowledge about item interactions can inform the choice of support and confidence values;
-
Objective of Analysis: The purpose of association rule mining influences the choice of thresholds. For exploratory analysis, higher support thresholds may be suitable to identify prominent associations, while lower thresholds may be preferred for comprehensive pattern discovery.
Example
Now you can conduct a simple experiment: change the min_support
and min_confidence
values in the code sample below and observe how your changes influence the results.
import pandas as pd from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori, association_rules # Create a sample transaction dataset transactions = [ ['milk', 'bread', 'eggs'], ['bread', 'butter', 'jam', 'eggs'], ['milk', 'bread', 'butter', 'jam'], ['milk', 'eggs', 'cheese'], ['bread', 'eggs', 'butter', 'jam', 'honey'], ['bread', 'eggs', 'jam', 'yogurt', 'fruit'], ['bread', 'milk', 'eggs', 'butter', 'jam', 'cheese', 'yogurt'], ['milk', 'cheese', 'jam', 'honey', 'fruit'], ['bread', 'milk', 'eggs', 'butter', 'jam', 'honey'] ] # Define minimum support and confidence thresholds min_support = 0.2 min_confidence = 0.7 # Initialize and fit `TransactionEncoder` encoder = TransactionEncoder() encoder.fit(transactions) # Transform transactions using the encoder one_hot_encoded = encoder.transform(transactions) # Convert one-hot encoded array to DataFrame df = pd.DataFrame(one_hot_encoded, columns=encoder.columns_) # Find frequent itemsets using the Apriori algorithm frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True) # Print frequent itemsets print("Frequent Itemsets:") print(frequent_itemsets) # Generate association rules rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence) # Print association rules with antecedent -> consequent format and confidence print("\nAssociation Rules:") for index, row in rules.iterrows(): print("Rule: {} => {} with Confidence: {:.2f}".format(list(row['antecedents']), list(row['consequents']), row['confidence']))
Thanks for your feedback!