Frequent Itemset Generation Using FP-Growth
FP-Growth uses FP-tree (Frequent Pattern Tree), a compressed representation of the database. Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets.
Build Tree
How to construct a FP-Tree?
- Create the root node (null)
- Scan the database, get the frequent itemsets of length 1, and sort these $1$-itemsets in decreasing support count.
- Read a transaction at a time. Sort items in the transaction acoording to the last step.
- For each transaction, insert items to the FP-Tree from the root node and increment occurence record at every inserted node.
- Create a new child node if reaching the leaf node before the insersion completes.
- If a new child node is created, link it from the last node consisting of the same item.
Mining Tree
How to mine the frequent itemsets using the FP-Tree?
FP-Growth finds all the frequent itemsets ending with a particular suffix by employing a divide-and-conquer strategy to split the problem into smaller subproblems.
- Using the pointer in the header table, decompose the FP-Tree into multiple subtrees, each represent a subproblem (ex. finding frequent itemsets ending in $e$)
- For each subproblem, traverse the corresponding subtree bottom-up to obtain conditional pattern bases for the subproblem recursively.
For example, we want to find frequent itemsets ending in $e$. First decompose the FP-Tree to obtain the predix paths ending with $e$, so that we can conclude that the conditional pattern bases for $e$ are
\[\{(a:1, c:1, d:1), (a:1, d:1), (b:1, c:1)\}\]Then we can divide the problem into 3 subproblems (with minimum support count $2$)
- find frequent itemsets ending in $ae$
- find frequent itemsets ending in $ce$
- find frequent itemsets ending in $de$
and solve them recursively until the conditional FP tree contains only an empty node.
(Note that we do not need to find frequent itemsets ending in $be$ since the support count of $be$ is $1$ㄝ, which is smaller than the support count threshold.)
So let’s go back to the original problem: How to mine the frequent itemsets using the FP-Tree? Using FP-Growth, we can obtain a list of frequent itemsets ordered by their corresponding suffix:
Benefits of using FP-Tree structure
- No need to generate itemset candidates
- No need to scan the database frequently. (Only 2 pass is needed.)