Learn Comparing Algorithmic Performance | High Performance Rule Mining and Scale Optimization

Swipe to show menu

Understanding the computational differences between Apriori and FP-Growth is crucial when mining association rules at scale. Here is a structured breakdown to help you compare these two algorithms effectively:

Memory Usage

Apriori

Generates all possible itemsets and stores them in memory, which can cause memory usage to grow exponentially as the number of unique items increases;

FP-Growth

Builds a compact FP-tree structure that stores transactions more efficiently by collapsing common prefixes, greatly reducing memory footprint for dense datasets.

Speed

Apriori

Repeatedly scans the dataset to count itemset frequencies, leading to high runtime, especially as itemset size increases;

FP-Growth

Requires only two full passes over the data—one to determine frequent items and another to build the FP-tree—making it significantly faster for large datasets.

Scalability

Apriori

Struggles with scalability due to the combinatorial explosion of candidate itemsets, making it impractical for very large or dense datasets;

FP-Growth

Scales better by avoiding candidate generation and leveraging tree compression, enabling efficient mining even as data size and itemset complexity grow.

Practical Scenarios

Use Apriori

When datasets are small or sparse, interpretability and simplicity matter, or when you want to easily tune minimum support thresholds;

Use FP-Growth

When datasets are large, dense, or memory efficiency and speed are priorities.

The following code sample demonstrates how to create a synthetic dataset of market basket transactions and compare the runtime performance of the Apriori and FP-Growth algorithms using Python's mlxtend library. You will see how each algorithm handles the same data and how their execution times differ, highlighting their efficiency in practical scenarios.


              123456789101112131415161718192021222324252627
            
import pandas as pd
from mlxtend.frequent_patterns import apriori, fpgrowth
import numpy as np
import time

# Create a medium-sized synthetic dataset
np.random.seed(42)
items = ['apple', 'banana', 'milk', 'bread', 'butter', 'cheese', 'eggs', 'juice']
n_transactions = 3000
data = []
for _ in range(n_transactions):
    basket = np.random.choice(items, size=np.random.randint(2, 6), replace=False)
    data.append({item: (item in basket) for item in items})
df = pd.DataFrame(data)

# Timing Apriori
start_apriori = time.time()
apriori(df, min_support=0.05, use_colnames=True)
end_apriori = time.time()

# Timing FP-Growth
start_fpgrowth = time.time()
fpgrowth(df, min_support=0.05, use_colnames=True)
end_fpgrowth = time.time()

print(f"Apriori runtime: {end_apriori - start_apriori:.3f} seconds")
print(f"FP-Growth runtime: {end_fpgrowth - start_fpgrowth:.3f} seconds")

1. When should you prefer FP-Growth over Apriori for mining association rules?

2. Which statement best describes the computational complexity difference between Apriori and FP-Growth?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 2

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 2. Chapter 2