Compilation of association rules in the FOAF social media dataset

I am working on a project called "discovering association rules from social network data: introducing data mining into a semantic network." Can anyone suggest a good source for the algorithm (and its code. I heard that it can be implemented using Perl packages as well as R) to find association rules from a social network database?

A snapshot of the database can be obtained at the following link: https://docs.google.com/uc?id=0B0mXGRdRowo1MDZlY2Q0NDYtYjlhMi00MmNjLWFiMWEtOGQ0MjA3NjUyZTE5&export=download&hl=en_US

The dataset is available at the following link: http://ebiquity.umbc.edu/get/a/resource/82.zip

I searched a lot for this project, but, unfortunately, I can’t find anything useful yet. The following link, which I found somewhat related:

Criminal data: http://www.computer.org/portal/web/csdl/doi/10.1109/CSE.2009.435

Your help would be greatly appreciated.

Thanks,

+4
source share
3 answers

This is a bit wider than http://en.wikipedia.org/wiki/Association_rule_learning , but hopefully useful.

Some early FOAF work that might be interesting (SVD / PCA, etc.):

http://stderr.org/~elw/foaf/ http://www.scribd.com/doc/353326/The-Social-Semantics-of-LiveJournal-FOAF-Structure-and-Change-from-2004-to -2005 http://datamining.sztaki.hu/files/snakdd.pdf

Also Ch.4 http://www.amazon.com/Understanding-Complex-Datasets-Decompositions-Knowledge/dp/1584888326 is devoted to the application of matrix decomposition methods with respect to graph data structures; highly recommended.

Finally, Apache Mahout is the natural choice for large-scale data mining, machine learning, etc. https://cwiki.apache.org/MAHOUT/dimensional-reduction.html

+2
source

Well, the most widely used implementations of the original association rule algorithm (originally developed at IBM Almaden Research Center) are Apriori and Eclat, in particular, Christian Borgelt's C implementations.

(A brief description for those who are not familiar with the Association Rules (Frequently Asked Questions or Market Basket Analysis). The prototype application for the association rules analyzes consumer transactions, such as supermarket data: among buyers who buy Polish sausage, what percentage of them also buy brown bread?)

I would recommend the statistical platform, R. It is free and open source, and its package repository contains (at least) four libraries, directed exclusively by the Association Rules, all with excellent documentation - three of the four Packages include the Manual and a separate Vignette (unofficial prose document with code examples). Both manuals and vignettes contain numerous examples in R code.

I have used three of the four Packages below, and I can recommend these three. Among them are bindings for Eclat and Apriori. These libraries are distributed as R 'Packages', which are available in the CRAN , R primary package repository. The basic installation and configuration of R is trivial - there are binaries for Mac, Linux, and Windows available at the link above. Similarly, installing / integrating with a package of packages is as simple as you would expect from an integrated platform (although not every of the four packages listed below has binaries for each OS).

So, on CRAN you will find these Packages, all aimed exclusively at the Association Rules:


This set of four R packages consists of R bindings for four different implementations of association rules, as well as a visualization library.

The first package, arules, includes R bindings for Eclat and Apriori. The second, arulesNBMiner, is a binding for the Michael Hahsler Rules Rules NB-often itemsets rule algorithm . The third, Arules Sequences, is a binding for Mohammed Zaki cSPADE .

The latter is especially useful because it is a visualization library for outputting results from any of the three previous packages. For your research on the social network, I suspect that you will find a visualization of the graph β€” that is, an explicit visualization of nodes (users in the data set) and edges (connections between them). A.

+4
source

If you need Java code, you can check out the SPMF software website. It provides source code for more than 45 algorithms for a frequent set of minerals, development of associations, sequential layout of templates, etc.

In addition, it not only provides the most popular algorithms. It also offers many options, such as mining rare items, high utility, vague elements not related to redundant association rules, closed association rules, indirect association rules, top-k association rules and much more ...

0
source

All Articles