A priori algorithm

I heard about the Apriori algorithm several times earlier, but did not get the time or the opportunity to delve into it, can someone explain to me the simple way this algorithm works? In addition, a basic example will ease my understanding.

+3
source share
4 answers

Well, I would suggest that you read the Wikipedia entry, but you said, โ€œA basic example will make me easier to understand.โ€ Wikipedia has something, so I assume that you have not read it and do not propose to do it.

Read the wikipedia article.

+4
source

See the top 10 data mining algorithms (free access) or Top Ten Data Mining Algorithms . The latter provides a detailed description of the algorithm, as well as detailed information on how to obtain optimized implementations.

+5
source

A priori algorithm

This is a method for selecting candidates and tests for the frequent development of patterns in datasets. There are two things you should remember.

Apriori pruning principle - If a collection of items is infrequent, then a superset of it should not be generated / checked.

Apriori Property - This (k+1)-itemset is a candidate (k+1)-itemset only if all its k-itemset are frequent.

Now, here is the algorithm a priori in 4 steps.

  • First scan the database / data set once to get a 1-itemset .
  • Create a length k+1 candidate element set from a length k frequent element sets.
  • Check candidates for database / data set.
  • Finish work if a frequent recruitment of candidates cannot be created.

Solved Example

Suppose you have a transaction database that follows 4 transactions, including their transaction IDs and items purchased with them. Assume the minimum support, min_sup is 2 . The term "support" is the number of transactions in which a certain set of elements is present / added.

DB transactions

 tid | items ------------- 10 | A,C,D 20 | B,C,E 30 | A,B,C,E 40 | B,E 

Now create the candidate 1-itemsets for the 1st database scan. It is simply called the set C_1 as follows.

 itemset | sup ------------- {A} | 2 {B} | 3 {C} | 3 {D} | 1 {E} | 3 

If we check this with min_sup , we can see that {D} does not satisfy min_sup of 2 . Thus, it will not be included in the 1-itemset , which we simply call the set L_1 as follows.

 itemset | sup ------------- {A} | 2 {B} | 3 {C} | 3 {E} | 3 

Now scan the database a second time and generate a 2-itemsets , which we simply call as a set of C_2 as follows.

 itemset | sup ------------- {A,B} | 1 {A,C} | 2 {A,E} | 1 {B,C} | 2 {B,E} | 3 {C,E} | 2 

As you can see, the elements {A,B} and {A,E} do not satisfy min_sup 2 and, therefore, they will not be included in the frequent 2-itemset , L_2

 itemset | sup ------------- {A,C} | 2 {B,C} | 2 {B,E} | 3 {C,E} | 2 

Now we will do the third database scan and get the candidate 3-itemsets , C_3 as follows.

 itemset | sup ------------- {A,B,C} | 1 {A,B,E} | 1 {A,C,E} | 1 {B,C,E} | 2 

You can see that {A,B,C} , {A,B,E} and {A,C,E} do not satisfy min_sup 2 . Therefore, they will not be included in the frequent 3-itemset , L_3 as follows.

 itemset | sup ------------- {B,C,E} | 2 

Now, finally, we can calculate the values โ€‹โ€‹of support (supp) , confidence (conf) and lift (interestingness value) Association / Correlation rules , which can be generated using the set of elements {B,C,E} as follows.

  Rule | supp | conf | lift ------------------------------------------- B -> C & E | 50% | 66.67% | 1.33 E -> B & C | 50% | 66.67% | 1.33 C -> E & B | 50% | 66.67% | 1.77 B & C -> E | 50% | 100% | 1.33 E & B -> C | 50% | 66.67% | 1.77 C & E -> B | 50% | 100% | 1.33 
+2
source

The best introduction to Apriori can be downloaded from this book:

http://www-users.cs.umn.edu/~kumar/dmbook/index.php

You can download Chapter 6 for free, which explains Apriori very clearly.

In addition, if you want to download the Java version of Apriori and other algorithms for a frequent set of components, you can check my site:

http://www.philippe-fournier-viger.com/spmf/

+1
source