Arule Sequence Mining in R

I want to use the arulesSequences package in R. However, I do not know how to force my data frame into an object that this package can use.

Here is a toy dataset that replicates my data structure:

 ids <- c(rep("X", 5), rep("Y", 5), rep("Z", 5)) seq <- rep(1:5,3) val <- sample(LETTERS, 15, replace=T) df <- data.frame(ids, seq, val) df ids seq val 1 X 1 T 2 X 2 H 3 X 3 V 4 X 4 A 5 X 5 X 6 Y 1 D 7 Y 2 B 8 Y 3 A 9 Y 4 D 10 Y 5 P 11 Z 1 Q 12 Z 2 R 13 Z 3 W 14 Z 4 W 15 Z 5 P 

Any help would be greatly appreciated.

+7
source share
5 answers

Data ratio:

 df_fact = data.frame(lapply(df,as.factor)) 

Create transaction data:

 df_trans = as(df_fact, 'transactions') 

Check this:

 itemFrequencyPlot(df_trans, support = 0.1, cex.names=0.8) 
0
source

Using read_baskets:

  read_baskets(con = filePath.txt, sep = " ", info = c("sequenceID","eventID","SIZE")) 

Which in practice means exporting the created data to a text file and re-importing through read_baskets. The info argument defines the first columns containing the sequenceID, eventID, and the optional eventize column.

0
source

Instead of using a data frame, the best thing for me was to split the data into separate ones and convert it into transactions.

  eh$cost<-split(eh$cost$val ,eh$cost$id) eh$cost1<- as(eh$cost,"transactions") 
0
source

This worked for me to add essentially the "order" of the column, which indicates the ranking of the order, and not the time value. You just have to be very specific in the naming convention. Try to name the sequenceID variable of the group "group" or "order basket #" and call the identifier of the ranking or order event.

Another thing that helped me (and made me scratch my head for a long time) was that read_baskets () seemed to need me to point

 read_baskets(con = filePath.txt, sep = " ", info = c("sequenceID","eventID","SIZE")) 

Although the helper function makes c () details look like an optional header, this is not the case. It seemed to me that I needed to remove the header from my file and specify it in the read_baskets () command, or I ran into problems.

0
source

You must first change your position on a transaction to simply force an item column
trans = as(df[,'val'], "transactions")

then you can add information to your transaction object

trans@itemsetInfo $transactionID = NULL trans@itemsetInfo $sequenceID = df$ids trans@itemsetInfo $eventID = df$seq

0
source

All Articles