Conteúdo do Curso
Association Rule Mining
Association Rule Mining
FP-growth Algorithm.FP-tree
An FP-tree, or frequent pattern tree, is a data structure that represents a compact and efficient way to store transactional datasets, where each transaction consists of a set of items.
The FP-tree organizes transactions by their common items, linking them in a tree structure based on their frequency of occurrence in a particular item combination.
How to construct FP-tree
-
Scan the Database: Traverse the transaction database and count the frequency of each item among all dataset. Sort items in descending order of frequency;
-
Create the Root of the FP-Tree: Initialize an empty root node;
-
Scan the Database Again: For each transaction in the database, filter and sort the items according to their total frequency calculated on the first point of the algorithm;
-
Update the FP-Tree: For each filtered transaction:
- Start at the root node of the tree;
- If an item in the transaction is already a child node of the current node, increment its count;
- Otherwise, create a new child node with the item and set its count to 1;
- Move to the next item in the transaction and repeat the process until all items are processed.
-
Link Similar Items: After processing all transactions, link nodes representing the same item together by their link pointers;
-
Return the FP-Tree: The constructed FP-tree is ready for further mining.
FP-tree example
Let's consider the following dataset:
Let's arrange our transaction data in descending order of total frequency across the entire dataset, prioritizing items with higher frequency over those with lower frequency.
Total frequency table looks like this:
Item | Total Frequency |
---|---|
bread | 6 |
eggs | 6 |
butter | 4 |
jam | 4 |
milk | 4 |
cheese | 3 |
As a result we can create sorted transaction table:
The FP-tree for this dataset will look like this:
As a result we have a FP-tree that represents out transaction data. Red arrows are needed to connect similar products in different tree branches - this will be used in calculating support for frequent itemsets mining.
Obrigado pelo seu feedback!