I'm trying to understand the basics of the Apriori (Basket) algorithm for use in data mining,
I’ll best explain what complication I have with an example:
Here is the transactional dataset:
t1: Milk, Chicken, Beer t2: Chicken, Cheese t3: Cheese, Boots t4: Cheese, Chicken, Beer t5: Chicken, Beer, Clothes, Cheese, Milk t6: Clothes, Beer, Milk t7: Beer, Milk, Clothes
minsup for higher - 0.5 or 50%.
Taking from the above, my transaction number is clearly 7 , which means that the set of items will be "frequent", it should have a score of 4/7. So this was my Frequent Set 1:
F1:
Milk = 4 Chicken = 4 Beer = 5 Cheese = 4
Then I created my candidates for a second refinement (C2) and narrowed it down to:
F2:
{Milk, Beer} = 4
That's where I got confused if they ask me to display all the frequent elements, do I write down all F1 and F2 or just F2 ? F1 are not “sets” for me.
Then I will be asked to create association rules for the frequent sets of items that I just determined, and calculate their “confidence” numbers, I get the following:
Milk -> Beer = 100% confidence Beer -> Milk = 80% confidence
It seems unnecessary to place F1 itemsets here, since they will all be 100% trusted no matter what they are not “binding”, and that’s why I’m asking now if F1 really frequent?