Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте FP-growth Algorithm.FP-tree | Mining Frequent Itemsets
Association Rule Mining

book
FP-growth Algorithm.FP-tree

An FP-tree, or frequent pattern tree, is a data structure that represents a compact and efficient way to store transactional datasets, where each transaction consists of a set of items.
The FP-tree organizes transactions by their common items, linking them in a tree structure based on their frequency of occurrence in a particular item combination.

How to construct FP-tree

  1. Scan the Database: Traverse the transaction database and count the frequency of each item among all dataset. Sort items in descending order of frequency;

  2. Create the Root of the FP-Tree: Initialize an empty root node;

  3. Scan the Database Again: For each transaction in the database, filter and sort the items according to their total frequency calculated on the first point of the algorithm;

  4. Update the FP-Tree: For each filtered transaction:

    • Start at the root node of the tree;
    • If an item in the transaction is already a child node of the current node, increment its count;
    • Otherwise, create a new child node with the item and set its count to 1;
    • Move to the next item in the transaction and repeat the process until all items are processed.
  5. Link Similar Items: After processing all transactions, link nodes representing the same item together by their link pointers;

  6. Return the FP-Tree: The constructed FP-tree is ready for further mining.

FP-tree example

Let's consider the following dataset:

Let's arrange our transaction data in descending order of total frequency across the entire dataset, prioritizing items with higher frequency over those with lower frequency.
Total frequency table looks like this:

ItemTotal Frequency
bread6
eggs6
butter4
jam4
milk4
cheese3

As a result we can create sorted transaction table:

The FP-tree for this dataset will look like this:

As a result we have a FP-tree that represents out transaction data. Red arrows are needed to connect similar products in different tree branches - this will be used in calculating support for frequent itemsets mining.

question mark

What does the count variable represent at each node of the FP-tree?

Select the correct answer

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 2. Розділ 4
some-alt