Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
FP-growth Algorithm.FP-tree | Mining Frequent Itemsets
course content

Contenido del Curso

Association Rule Mining

FP-growth Algorithm.FP-treeFP-growth Algorithm.FP-tree

An FP-tree, or frequent pattern tree, is a data structure that represents a compact and efficient way to store transactional datasets, where each transaction consists of a set of items.
The FP-tree organizes transactions by their common items, linking them in a tree structure based on their frequency of occurrence in a particular item combination.

How to construct FP-tree

  1. Scan the Database: Traverse the transaction database and count the frequency of each item among all dataset. Sort items in descending order of frequency;
  2. Create the Root of the FP-Tree: Initialize an empty root node;
  3. Scan the Database Again: For each transaction in the database, filter and sort the items according to their total frequency calculated on the first point of the algorithm;
  4. Update the FP-Tree: For each filtered transaction:
    • Start at the root node of the tree;
    • If an item in the transaction is already a child node of the current node, increment its count;
    • Otherwise, create a new child node with the item and set its count to 1;
    • Move to the next item in the transaction and repeat the process until all items are processed.
  5. Link Similar Items: After processing all transactions, link nodes representing the same item together by their link pointers;
  6. Return the FP-Tree: The constructed FP-tree is ready for further mining.

FP-tree example

Let's consider the following dataset:

Transaction ID Items
1 milk, bread, eggs, cheese
2 bread, butter, eggs, jam
3 milk, bread, butter, cheese
4 milk, eggs, jam
5 bread, eggs, butter, jam
6 bread, eggs
7 bread, milk, eggs, butter

Let's arrange our transaction data in descending order of total frequency across the entire dataset, prioritizing items with higher frequency over those with lower frequency.
Total frequency table looks like this:

ItemTotal Frequency
bread6
eggs6
butter4
jam4
milk4
cheese3

As a result we can create sorted transaction table:

Transaction ID Sorted Items
1 bread, eggs, milk, cheese
2 bread, eggs, butter, jam
3 bread, milk, butter, cheese
4 eggs, milk, jam
5 bread, eggs, butter, jam
6 bread, eggs
7 bread, eggs, milk, butter

The FP-tree for this dataset will look like this:

As a result we have a FP-tree that represents out transaction data. Red arrows are needed to connect similar products in different tree branches - this will be used in calculating support for frequent itemsets mining.

What does the count variable represent at each node of the FP-tree?

Selecciona la respuesta correcta

¿Todo estuvo claro?

Sección 2. Capítulo 4
course content

Contenido del Curso

Association Rule Mining

FP-growth Algorithm.FP-treeFP-growth Algorithm.FP-tree

An FP-tree, or frequent pattern tree, is a data structure that represents a compact and efficient way to store transactional datasets, where each transaction consists of a set of items.
The FP-tree organizes transactions by their common items, linking them in a tree structure based on their frequency of occurrence in a particular item combination.

How to construct FP-tree

  1. Scan the Database: Traverse the transaction database and count the frequency of each item among all dataset. Sort items in descending order of frequency;
  2. Create the Root of the FP-Tree: Initialize an empty root node;
  3. Scan the Database Again: For each transaction in the database, filter and sort the items according to their total frequency calculated on the first point of the algorithm;
  4. Update the FP-Tree: For each filtered transaction:
    • Start at the root node of the tree;
    • If an item in the transaction is already a child node of the current node, increment its count;
    • Otherwise, create a new child node with the item and set its count to 1;
    • Move to the next item in the transaction and repeat the process until all items are processed.
  5. Link Similar Items: After processing all transactions, link nodes representing the same item together by their link pointers;
  6. Return the FP-Tree: The constructed FP-tree is ready for further mining.

FP-tree example

Let's consider the following dataset:

Transaction ID Items
1 milk, bread, eggs, cheese
2 bread, butter, eggs, jam
3 milk, bread, butter, cheese
4 milk, eggs, jam
5 bread, eggs, butter, jam
6 bread, eggs
7 bread, milk, eggs, butter

Let's arrange our transaction data in descending order of total frequency across the entire dataset, prioritizing items with higher frequency over those with lower frequency.
Total frequency table looks like this:

ItemTotal Frequency
bread6
eggs6
butter4
jam4
milk4
cheese3

As a result we can create sorted transaction table:

Transaction ID Sorted Items
1 bread, eggs, milk, cheese
2 bread, eggs, butter, jam
3 bread, milk, butter, cheese
4 eggs, milk, jam
5 bread, eggs, butter, jam
6 bread, eggs
7 bread, eggs, milk, butter

The FP-tree for this dataset will look like this:

As a result we have a FP-tree that represents out transaction data. Red arrows are needed to connect similar products in different tree branches - this will be used in calculating support for frequent itemsets mining.

What does the count variable represent at each node of the FP-tree?

Selecciona la respuesta correcta

¿Todo estuvo claro?

Sección 2. Capítulo 4
some-alt