A priori algorithm
This is a method for selecting candidates and tests for the frequent development of patterns in datasets. There are two things you should remember.
Apriori pruning principle - If a collection of items is infrequent, then a superset of it should not be generated / checked.
Apriori Property - This (k+1)-itemset is a candidate (k+1)-itemset only if all its k-itemset are frequent.
Now, here is the algorithm a priori in 4 steps.
- First scan the database / data set once to get a
1-itemset . - Create a length
k+1 candidate element set from a length k frequent element sets. - Check candidates for database / data set.
- Finish work if a frequent recruitment of candidates cannot be created.
Solved Example
Suppose you have a transaction database that follows 4 transactions, including their transaction IDs and items purchased with them. Assume the minimum support, min_sup is 2 . The term "support" is the number of transactions in which a certain set of elements is present / added.
DB transactions
tid | items
Now create the candidate 1-itemsets for the 1st database scan. It is simply called the set C_1 as follows.
itemset | sup ------------- {A} | 2 {B} | 3 {C} | 3 {D} | 1 {E} | 3
If we check this with min_sup , we can see that {D} does not satisfy min_sup of 2 . Thus, it will not be included in the 1-itemset , which we simply call the set L_1 as follows.
itemset | sup ------------- {A} | 2 {B} | 3 {C} | 3 {E} | 3
Now scan the database a second time and generate a 2-itemsets , which we simply call as a set of C_2 as follows.
itemset | sup
As you can see, the elements {A,B} and {A,E} do not satisfy min_sup 2 and, therefore, they will not be included in the frequent 2-itemset , L_2
itemset | sup ------------- {A,C} | 2 {B,C} | 2 {B,E} | 3 {C,E} | 2
Now we will do the third database scan and get the candidate 3-itemsets , C_3 as follows.
itemset | sup
You can see that {A,B,C} , {A,B,E} and {A,C,E} do not satisfy min_sup 2 . Therefore, they will not be included in the frequent 3-itemset , L_3 as follows.
itemset | sup ------------- {B,C,E} | 2
Now, finally, we can calculate the values โโof support (supp) , confidence (conf) and lift (interestingness value) Association / Correlation rules , which can be generated using the set of elements {B,C,E} as follows.
Rule | supp | conf | lift ------------------------------------------- B -> C & E | 50% | 66.67% | 1.33 E -> B & C | 50% | 66.67% | 1.33 C -> E & B | 50% | 66.67% | 1.77 B & C -> E | 50% | 100% | 1.33 E & B -> C | 50% | 66.67% | 1.77 C & E -> B | 50% | 100% | 1.33