FP-Growth Algorithm for Association Analysis in Data Warehousing and Data Mining

 
FP Growth Algorithm
 
Association Analysis (Unit - IV)
DWDM
 
FP Growth Algorithm
 
FP-growth algorithm takes a radically different approach for discovering frequent itemsets.
The algorithm encodes the data set using a compact data structure called an 
FP-tree 
and extracts
frequent itemsets directly from this structure
FP-Tree Representation
An FP-tree is a compressed representation of the input data
. It is constructed by reading the data
set one transaction at a time and mapping each transaction onto a path in the FP-tree.
As different transactions can have several items in common, their paths may overlap. The more
the paths overlap with one another, the more compression we can achieve using the FP-tree
structure.
If the size of the FP-tree is small enough to fit into main memory, this will allow us to extract
frequent itemsets directly from the structure in memory instead of making repeated passes over
the data stored on disk.
 
FP Tree Representation
 
 
FP Tree Representation
 
Figure 6.24 shows a data set that
contains ten transactions and five
items.
The structures of the FP-tree after
reading the first three
transactions are also depicted in
the diagram.
Each node in the tree contains the
label
 of an item along with a
counter
 that shows the number of
transactions mapped onto the
given path.
Initially, the FP-tree contains only
the 
root node 
represented by the
null 
symbol.
 
FP Tree Representation
 
1. The data set is scanned once to
determine the support count of
each item. Infrequent items are
discarded, while the frequent
items are sorted in decreasing
support counts. For the data set
shown in Figure, 
a 
is the most
frequent item, followed by 
b, c, d,
and e.
 
FP Tree Representation
 
2. The algorithm makes a second
pass over the data to construct
the FP-tree. After reading the
first transaction, 
{a, b}, the nodes
labeled as a 
and 
b are created. A
path is then formed from 
null
a 
b
 to encode 
the transaction.
Every node along the path has a
frequency count of 1.
 
FP Tree Representation
 
3. After reading the second transaction, 
{b,c,d}, a new set of
nodes is created 
for items 
b, c, and d. A path is then
formed to represent the 
transaction by connecting the
nodes null 
 b 
 c 
 d. Every node 
along this path
also has a frequency count equal to one.
4. The third transaction, 
{a,c,d,e}, shares a common prefix
item (which 
is 
a) with the first transaction. As a result,
the path for the third 
transaction, null 
 a 
 c 
 d
 e, overlaps with the path for the 
first transaction,
null 
 a 
 b. Because of their overlapping path, the
frequency count for node 
a is incremented to two,
while the frequency 
counts for the newly created
nodes, 
c, d, and e, are equal to one.
5. This process continues until every transaction has been
mapped onto one of the paths given in the FP-tree.
The resulting FP-tree after reading all the transactions
is shown in Figure 6.24.
 
FP Tree Representation
 
The size of an FP-tree is typically smaller
than the size of the uncompressed data
because many transactions in market
basket data often share a few items in
common.
In the best-case scenario, where all the
transactions have the same set of items,
the FP-tree contains only a single branch
of nodes.
The worst-case scenario happens when
every transaction has a unique set of
items.
 
FP Tree Representation
 
The size of an FP-tree also
depends on how the items are
ordered.
If the ordering scheme in the
preceding example is reversed,
i.e., from lowest to highest
support item, the resulting FP-
tree is shown in Figure 6.25.
An FP-tree also contains a list
of 
pointers connecting
between nodes that have the
same items.
These pointers, represented as
dashed lines in Figures 6.24
and 6.25, help to facilitate the
rapid access of individual
items in the tree.
 
Frequent Itemset Generation using FP-Growth Algorithm
 
Steps in FP-Growth Algorithm:
Step-1: 
Scan the database to build  
Frequent 1-item set
  which will contain all
the elements whose frequency is greater than or equal to the minimum
support. These elements are stored in descending order of their
respective frequencies.
Step-2:  
For each transaction, the respective 
Ordered-Item set
 is built.
Step-3: 
Construct the FP tree. by scanning each 
Ordered-Item set
Step-4: 
For each item, the 
Conditional Pattern Base
 is computed which is
path labels of all the paths which lead to any node of the given item in
the frequent-pattern tree.
Step-5: 
For each item, the 
Conditional Frequent Pattern Tree is built.
Step-6: Frequent Pattern rules
 are generated by pairing the items of the
Conditional Frequent Pattern Tree set to each corresponding item.
 
Frequent Itemset Generation in FP-Growth Algorithm
Example:
 
The frequency of each individual
item is computed:-
 
Given Database:
 
min_support=3
 
Frequent Itemset Generation in FP-Growth Algorithm
 
A 
Frequent Pattern set
 is built which will contain all the elements whose
frequency is greater than or equal to the minimum support. These elements are
stored in 
descending order 
of their respective frequencies.
L = {K : 5, E : 4, M : 3, O : 3, Y : 3}
Now, for each transaction, the respective 
Ordered-Item set
 is built. It is done by
iterating the Frequent Pattern set and checking if the current item is contained
in the transaction. The following table is built for all the transactions:
 
Frequent Itemset Generation in FP-Growth Algorithm
 
Now, all the Ordered-Item sets
are inserted into a Trie Data
Structure.
a)
Inserting the set {K, E, M, O,
Y}:
      All the items are simply
linked one after the other in
the order of occurrence in
the set and initialize the
support count for each item
as 1.
 
Frequent Itemset Generation in FP-Growth Algorithm
 
b) 
Inserting the set {K, E, O, Y}:
Till the insertion of the elements K and
E, simply the support count is increased
by 1.
There is no direct link between E and O,
therefore a new node for the item O is
initialized with the support count as 1
and item E is linked to this new node.
On inserting Y, we first initialize a new
node for the item Y with support count
as 1 and link the new node of O with the
new node of Y.
 
Frequent Itemset Generation in FP-Growth Algorithm
 
c) 
Inserting the set {K, E, M}:
Here simply the support
count of each element is
increased by 1.
 
Frequent Itemset Generation in FP-Growth Algorithm
 
d) 
Inserting the set {K, M, Y}:
Similar to step b), first the
support count of K is
increased, then new nodes
for M and Y are initialized
and linked accordingly.
 
Frequent Itemset Generation in FP-Growth Algorithm
 
e) 
Inserting the set {K, E, O}:
Here simply the support
counts of the respective
elements are increased.
 
Frequent Itemset Generation in FP-Growth Algorithm
 
Now, for each item starting from 
leaf
, the 
Conditional Pattern Base
 is computed
which is path labels of all the paths which lead to any node of the given item in
the frequent-pattern tree.
 
Frequent Itemset Generation in FP-Growth Algorithm
 
Now for each item, the 
Conditional Frequent Pattern Tree is built.
It is done by taking the set of elements that is common in all the paths in the Conditional
Pattern Base of that item and calculating its support count by 
summing
 the support counts of all
the paths in the Conditional Pattern Base.
The itemsets whose support count >=  
min_support value 
are retained in the Conditional
Frequent Pattern Tree  and the rest are discarded.
 
Frequent Itemset Generation in FP-Growth Algorithm
 
From the Conditional Frequent Pattern tree, the 
Frequent Pattern rules
 are
generated by pairing the items of the Conditional Frequent Pattern Tree set to
the corresponding to the item as given in the below table.
 
For each row, two types of association rules can be inferred for example for the first
row which contains the element, the rules K -> Y and Y -> K can be inferred.
To determine the valid rule, the confidence of both the rules is calculated and the one
with confidence greater than or equal to the minimum confidence value is retained.
Slide Note
Embed
Share

FP-Growth algorithm is a powerful method for discovering frequent itemsets in data sets. It utilizes a compact data structure called an FP-tree to efficiently mine frequent patterns. By encoding data into an FP-tree representation, the algorithm can identify frequent itemsets directly from memory, making it more efficient compared to traditional methods that require repeated passes over disk-stored data.

  • Association Analysis
  • Data Warehousing
  • Data Mining
  • FP-Growth Algorithm

Uploaded on Jul 17, 2024 | 7 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. FP Growth Algorithm Association Analysis (Unit - IV) DWDM

  2. FP Growth Algorithm FP-growth algorithm takes a radically different approach for discovering frequent itemsets. The algorithm encodes the data set using a compact data structure called an FP frequent itemsets directly from this structure FP- -Tree Representation Tree Representation An FP An FP- -tree is a compressed representation of the input data tree is a compressed representation of the input data. It is constructed by reading the data set one transaction at a time and mapping each transaction onto a path in the FP-tree. As different transactions can have several items in common, their paths may overlap. The more the paths overlap with one another, the more compression we can achieve using the FP-tree structure. If the size of the FP-tree is small enough to fit into main memory, this will allow us to extract frequent itemsets directly from the structure in memory instead of making repeated passes over the data stored on disk. FP- -tree tree and extracts FP

  3. FP Tree Representation

  4. FP Tree Representation Figure 6.24 shows a data set that contains ten transactions and five items. The structures of the FP-tree after reading the first three transactions are also depicted in the diagram. Each node in the tree contains the label label of an item along with a counter counter that shows the number of transactions mapped onto the given path. Initially, the FP-tree contains only the root node root node represented by the null null symbol.

  5. FP Tree Representation 1. The data set is scanned once to determine the support count of each item. Infrequent items are discarded, while the frequent items are sorted in decreasing support counts. For the data set shown in Figure, a is the most frequent item, followed by b, c, d, and e.

  6. FP Tree Representation 2. The algorithm makes a second pass over the data to construct the FP-tree. After reading the first transaction, {a, b}, the nodes labeled as a and b are created. A path is then formed from null a a b b to encode the transaction. Every node along the path has a frequency count of 1. null

  7. FP Tree Representation 3. After reading the second transaction, {b,c,d}, a new set of nodes is created for items b, c, and d. A path is then formed to represent the transaction by connecting the nodes null b c d. Every node along this path also has a frequency count equal to one. 4. The third transaction, {a,c,d,e}, shares a common prefix item (which is a) with the first transaction. As a result, the path for the third transaction, null a c d e, overlaps with the path for the first transaction, null a b. Because of their overlapping path, the frequency count for node a is incremented to two, while the frequency counts for the newly created nodes, c, d, and e, are equal to one. 5. This process continues until every transaction has been mapped onto one of the paths given in the FP-tree. The resulting FP-tree after reading all the transactions is shown in Figure 6.24.

  8. FP Tree Representation The size of an FP-tree is typically smaller than the size of the uncompressed data because many transactions in market basket data often share a few items in common. In the best-case scenario, where all the transactions have the same set of items, the FP-tree contains only a single branch of nodes. The worst-case scenario happens when every transaction has a unique set of items.

  9. FP Tree Representation The size of an FP-tree also depends on how the items are ordered. If the ordering scheme in the preceding example is reversed, i.e., from lowest to highest support item, the resulting FP- tree is shown in Figure 6.25. An FP-tree also contains a list of pointers connecting pointers connecting between nodes that have the between nodes that have the same items. same items. These pointers, represented as dashed lines in Figures 6.24 and 6.25, help to facilitate the rapid access of individual items in the tree.

  10. Frequent Itemset Generation using FP-Growth Algorithm Steps in FP Steps in FP- -Growth Algorithm: Growth Algorithm: Step Step- -1: 1: Scan the database to build Frequent 1 the elements whose frequency is greater than or equal to the minimum support. These elements are stored in descending order of their respective frequencies. Step Step- -2: 2: For each transaction, the respective Ordered Step Step- -3: 3: Construct the FP tree. by scanning each Ordered Step Step- -4: 4: For each item, the Conditional Pattern Base Conditional Pattern Base is computed which is path labels of all the paths which lead to any node of the given item in the frequent-pattern tree. Step Step- -5: 5: For each item, the Conditional Frequent Pattern Tree is built. Conditional Frequent Pattern Tree is built. Step Step- -6: Frequent Pattern rules 6: Frequent Pattern rules are generated by pairing the items of the Conditional Frequent Pattern Tree set to each corresponding item. Frequent 1- -item set item set which will contain all Ordered- -Item set Ordered- -Item set Item set is built. Item set

  11. Frequent Itemset Generation in FP-Growth Algorithm Example: Given Database: min_support=3 The frequency of each individual item is computed:-

  12. Frequent Itemset Generation in FP-Growth Algorithm A Frequent Pattern set Frequent Pattern set is built which will contain all the elements whose frequency is greater than or equal to the minimum support. These elements are stored in descending order descending order of their respective frequencies. L = {K : 5, E : 4, M : 3, O : 3, Y : 3} L = {K : 5, E : 4, M : 3, O : 3, Y : 3} Now, for each transaction, the respective Ordered iterating the Frequent Pattern set and checking if the current item is contained in the transaction. The following table is built for all the transactions: Ordered- -Item set Item set is built. It is done by

  13. Frequent Itemset Generation in FP-Growth Algorithm Now, all the Ordered-Item sets are inserted into a Trie Data Structure. a) a) Inserting the set {K, E, M, O, Inserting the set {K, E, M, O, Y}: Y}: All the items are simply linked one after the other in the order of occurrence in the set and initialize the support count for each item as 1.

  14. Frequent Itemset Generation in FP-Growth Algorithm b) Inserting the set {K, E, O, Y}: Inserting the set {K, E, O, Y}: Till the insertion of the elements K and E, simply the support count is increased by 1. There is no direct link between E and O, therefore a new node for the item O is initialized with the support count as 1 and item E is linked to this new node. On inserting Y, we first initialize a new node for the item Y with support count as 1 and link the new node of O with the new node of Y.

  15. Frequent Itemset Generation in FP-Growth Algorithm c) Inserting the set {K, E, M}: Inserting the set {K, E, M}: Here simply the support count of each element is increased by 1.

  16. Frequent Itemset Generation in FP-Growth Algorithm d) Inserting the set {K, M, Y}: Inserting the set {K, M, Y}: Similar to step b), first the support count of K is increased, then new nodes for M and Y are initialized and linked accordingly.

  17. Frequent Itemset Generation in FP-Growth Algorithm e) Inserting the set {K, E, O}: Inserting the set {K, E, O}: Here simply the support counts of the respective elements are increased.

  18. Frequent Itemset Generation in FP-Growth Algorithm Now, for each item starting from leaf which is path labels of all the paths which lead to any node of the given item in the frequent-pattern tree. leaf, the Conditional Pattern Base Conditional Pattern Base is computed

  19. Frequent Itemset Generation in FP-Growth Algorithm Now for each item, the Conditional Frequent Pattern Tree is built. Conditional Frequent Pattern Tree is built. It is done by taking the set of elements that is common in all the paths in the Conditional Pattern Base of that item and calculating its support count by summing the paths in the Conditional Pattern Base. summing the support counts of all The itemsets whose support count >= min_support value are retained in the Conditional Frequent Pattern Tree and the rest are discarded.

  20. Frequent Itemset Generation in FP-Growth Algorithm From the Conditional Frequent Pattern tree, the Frequent Pattern rules generated by pairing the items of the Conditional Frequent Pattern Tree set to the corresponding to the item as given in the below table. Frequent Pattern rules are For each row, two types of association rules can be inferred for example for the first row which contains the element, the rules K -> Y and Y -> K can be inferred. To determine the valid rule, the confidence of both the rules is calculated and the one with confidence greater than or equal to the minimum confidence value is retained.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#