How to prepare transaction data in a basket for Aruley

Ok, so I searched a lot and want to run arules according to sales data. I just need to get the data in the correct format correctly and set it up with the right “factors” or “variables” and in the form of a basket.

Now I have sales data with order #, and then the items inside this. Each order is unique (each new order, a new # is created and includes a part #), but the same elements can obviously be displayed in many orders.

My data is currently configured as follows:

  Order # Part # PartDescription 
  1 A PartA
 1 B PartB
 1 G PartG
 2 R PartR
 3 A PartA
 3 B PartB
 4 E PartE
 5 Y PartY
 6 A PartA
 6 B PartB
 6 F PartF
 6 V PartV

Therefore, R is not like in this form, and I have to get it in the form in which arules and data analysis will be accepted.

Yes I save it as a text file and tried the .csv file, but if I can get step-by-step instructions on how to prepare it or manage it in RStudio, that would be great.

I read that it should be in the shape of a basket, such as ..

1 (A, B, G)
2 (R)
3 (A, B)
4 (E)
5 (Y)
6 (A, B, F, V)

If it is not, please correct me. I get this idea, but I just need step-by-step instructions that I cannot find anywhere else. I tried using dplyr and tidyr. I am good at data analysis, but need more direct help from RStudio, so if I could just do it step by step, I will understand this further.

+7
r arules market-basket-analysis
source share
2 answers

Take a look at the help page for a transaction data type for examples of how to get data:

library(arules) ?transactions 

For your type, you want split by order, then use as to translate it into a list of transactions:

 trans <- as(split(data[,"Part"], data[,"Order"]), "transactions") inspect(trans) items transactionID 1 {A,B,G} 1 2 {R} 2 3 {A,B} 3 4 {E} 4 5 {Y} 5 6 {A,B,F,V} 6 
+6
source share

I had a lot of enforcement issues (like "like (dataname," transaction "...).

I believe that this is due to the fact that I have duplicate records (that is, the same item purchased more than once in the same transaction when the data is in the "same" format).

Here is what finally helped me:

 Transactions<- read.transactions("Data with tx ids, item names, in single format.csv", rm.duplicates= TRUE, sep=",", format = "single", cols = c(7,9)); 

(tx id in column 7, element names in column 9)

+1
source share

All Articles