Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
FP-growth Algorithm.FP-tree | Mining Frequent Itemsets
Association Rule Mining
course content

Course Content

Association Rule Mining

Association Rule Mining

1. Introduction to Association Rule Mining
2. Mining Frequent Itemsets
3. Additional Applications of ARM

bookFP-growth Algorithm.FP-tree

An FP-tree, or frequent pattern tree, is a data structure that represents a compact and efficient way to store transactional datasets, where each transaction consists of a set of items.
The FP-tree organizes transactions by their common items, linking them in a tree structure based on their frequency of occurrence in a particular item combination.

How to construct FP-tree

  1. Scan the Database: Traverse the transaction database and count the frequency of each item among all dataset. Sort items in descending order of frequency;

  2. Create the Root of the FP-Tree: Initialize an empty root node;

  3. Scan the Database Again: For each transaction in the database, filter and sort the items according to their total frequency calculated on the first point of the algorithm;

  4. Update the FP-Tree: For each filtered transaction:

    • Start at the root node of the tree;
    • If an item in the transaction is already a child node of the current node, increment its count;
    • Otherwise, create a new child node with the item and set its count to 1;
    • Move to the next item in the transaction and repeat the process until all items are processed.
  5. Link Similar Items: After processing all transactions, link nodes representing the same item together by their link pointers;

  6. Return the FP-Tree: The constructed FP-tree is ready for further mining.

FP-tree example

Let's consider the following dataset:

Let's arrange our transaction data in descending order of total frequency across the entire dataset, prioritizing items with higher frequency over those with lower frequency.
Total frequency table looks like this:

ItemTotal Frequency
bread6
eggs6
butter4
jam4
milk4
cheese3

As a result we can create sorted transaction table:

The FP-tree for this dataset will look like this:

As a result we have a FP-tree that represents out transaction data. Red arrows are needed to connect similar products in different tree branches - this will be used in calculating support for frequent itemsets mining.

What does the count variable represent at each node of the FP-tree?

What does the count variable represent at each node of the FP-tree?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 4
We're sorry to hear that something went wrong. What happened?
some-alt