How to speed up the Apriori Framework, based on creating only association rules. What consequences (right side) are one element of the data set?

I have a csv file with 600,000 rows and 15 columns "Col1, Col2 ... COl15". I want to create association rules where only the right side has only values ​​from col15. I use the a priori implementation from here

It calculates minSupport for each element set as follows:

oneCSet = returnItemsWithMinSupport(itemSet,
                                        transactionList,
                                        minSupport,
                                        freqSet)
    print "reached line 80"
    currentLSet = oneCSet
    k = 2
    while(currentLSet != set([])):
        print k
        largeSet[k-1] = currentLSet
        currentLSet = joinSet(currentLSet, k)
        currentCSet = returnItemsWithMinSupport(currentLSet,
                                                transactionList,
                                                minSupport,
                                                freqSet)
        currentLSet = currentCSet
        k = k + 1

def returnItemsWithMinSupport(itemSet, transactionList, minSupport, freqSet):
        """calculates the support for items in the itemSet and returns a subset
       of the itemSet each of whose elements satisfies the minimum support"""
        _itemSet = set()
        localSet = defaultdict(int)
        #print itemSet

        for item in itemSet:
            #print "I am here", list(item)


            for transaction in transactionList:
                if item.issubset(transaction):
                    freqSet[item] += 1
                    localSet[item] += 1
        print "Done half"
        for item, count in localSet.items():
            support = float(count)/len(transactionList)

            if support >= minSupport:
                _itemSet.add(item)

        return _itemSet

, , . , RHS (Col15), , - ? - , . - /, ?

+6
2
  • , 15, RHS . , 5 , 5 . , .

  • ( ) , Apriori (!). , github, . FIM, !

  • (FIS β†’ RHS) .

, , col15. . , Apriliori FIM.

+1

. , . .

. . . , . .

:

  • 1
  • 1 2
  • 2
  • n n + 1 .
  • n + 1 n + 1
  • 5. 6. , , n + 1

n + 1, . :

  • , , ,

, , , , : x.

, , , 1 . 1.2 1 {(x)}, 1 . 1.2, , . . n.

1 , , , {(x)}. , {(x)}. , 1, - n, , .

, , , 2 . 2.1 , {(x)}. assoction , {(x)} . ({(n )} β†’ {(x)}).

, , freuqent. fp-growth . freuquent - prepost +.

python - .

tl; dr: . . apriori , , fp.

+2

All Articles