Frequent Itemset Generation Using FP-Growth

FP-Growth uses FP-tree (Frequent Pattern Tree), a compressed representation of the database. Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets.

Build Tree

How to construct a FP-Tree?

Create the root node (null)
Scan the database, get the frequent itemsets of length 1, and sort these $1$-itemsets in decreasing support count.
Read a transaction at a time. Sort items in the transaction acoording to the last step.
For each transaction, insert items to the FP-Tree from the root node and increment occurence record at every inserted node.
Create a new child node if reaching the leaf node before the insersion completes.
If a new child node is created, link it from the last node consisting of the same item.

Mining Tree

How to mine the frequent itemsets using the FP-Tree?

FP-Growth finds all the frequent itemsets ending with a particular suffix by employing a divide-and-conquer strategy to split the problem into smaller subproblems.

Using the pointer in the header table, decompose the FP-Tree into multiple subtrees, each represent a subproblem (ex. finding frequent itemsets ending in $e$)
For each subproblem, traverse the corresponding subtree bottom-up to obtain conditional pattern bases for the subproblem recursively.

For example, we want to find frequent itemsets ending in $e$. First decompose the FP-Tree to obtain the predix paths ending with $e$, so that we can conclude that the conditional pattern bases for $e$ are

\[\{(a:1, c:1, d:1), (a:1, d:1), (b:1, c:1)\}\]

Then we can divide the problem into 3 subproblems (with minimum support count $2$)

find frequent itemsets ending in $ae$
find frequent itemsets ending in $ce$
find frequent itemsets ending in $de$

and solve them recursively until the conditional FP tree contains only an empty node.

(Note that we do not need to find frequent itemsets ending in $be$ since the support count of $be$ is $1$ㄝ, which is smaller than the support count threshold.)

So let’s go back to the original problem: How to mine the frequent itemsets using the FP-Tree? Using FP-Growth, we can obtain a list of frequent itemsets ordered by their corresponding suffix:

Benefits of using FP-Tree structure

No need to generate itemset candidates
No need to scan the database frequently. (Only 2 pass is needed.)

An Explorer of Things

Hello, my name is Chihling :)

Frequent Itemset Generation Using FP-Growth

Build Tree

Mining Tree

Benefits of using FP-Tree structure

References