Convert list of vectors to graph data frame

I have a list of character vectors stored in a list, for example:

basket1 <- c("Apple", "Orange", "Banana", "Apple", "Apple", "Grape") basket2 <- c("Grape", "Grape", "Grape", "Grape") basket3 <- c("Kiwi", "Apple", "Cantaloupe", "Banana") basket4 <- c("Strawberry") basket5 <- c("Grape", "Grape", "Grape") FruitBasketList <- list(basket1, basket2, basket3, basket4, basket5) 

And I would like to turn the FruitBasketList into a data frame with the count of each fruit in each row corresponding to the basket from which it came. The main problem is that there can be thousands of different β€œfruits” in each vector, and many of them will appear more than once.

This is the desired data frame that I would like to receive as a result:

 Basket Apple Orange Banana Grape Kiwi Cantaloupe Strawberry basket1 3 1 1 1 0 0 0 basket2 0 0 0 4 0 0 0 basket3 1 0 1 0 1 1 0 basket4 0 0 0 0 0 0 1 basket5 0 0 0 3 0 0 0 

Obviously, this is not my real data, but I thought that I would simplify the fact that the data looks so that someone could understand it. No, this is not homework. In any case, the number of fruits in the basket can be a thousand different fruits, and the length of each fruit vector will not be the same. There may be tens of thousands of baskets (vectors). Obviously, some fruits can be repeated many times in the same vector (basket). I am working on a solution to this issue, but I am sure that it is terribly complex and very inefficient. So far, my solution involves combining all vectors from all vectors, and then identifying all the unique fruit names that are possible. It's good. Then the part I'm struggling with is creating an empty data frame from all these unique column names, and then for each vector counting each unique fruit, and then putting that value in the right column in a new row in the data frame along with zeros for the fruits, which do not exist in this particular basket.

The code I use to count individual vectors is as follows:

 GetUniqueItemCount <- function(rle, value) { value <- rle$lengths[rle$values == value] if (identical(value, integer(0))) { value <- 0 } value } 

And the code for the call is as follows:

 Apple <- GetUniqueItemCount(rle, "Apple") 

As you can see in my current code, I need to know all the possible fruits before the manual and hard code, count each fruit, and then assign it to a specific column, known in advance in the data frame. Anyway, I understand that I am going the wrong way here, so I would appreciate any advice on returning in search of the desired data frame shown above. Please feel free to suggest a completely different approach, rather than trying to figure out how to do your job if this would be the best way to solve the problem.

+5
source share
4 answers

I suggest mtabulate from the qdapTools package.

 library(qdapTools) mtabulate(FruitBasketList) # Apple Banana Cantaloupe Grape Kiwi Orange Strawberry # 1 3 1 0 1 0 1 0 # 2 0 0 0 4 0 0 0 # 3 1 1 1 0 1 0 0 # 4 0 0 0 0 0 0 1 # 5 0 0 0 3 0 0 0 

the author of the packages even shares your avatar. Sharpness.

+9
source

Using dplyr , I can do something like

 library(dplyr) m <- FruitBasketList %>% lapply(table) %>% lapply(as.list) %>% lapply(data.frame) %>% rbind_all() m # Source: local data frame [5 x 7] # # Apple Banana Grape Orange Cantaloupe Kiwi Strawberry # 1 3 1 1 1 NA NA NA # 2 NA NA 4 NA NA NA NA # 3 1 1 NA NA 1 1 NA # 4 NA NA NA NA NA NA 1 # 5 NA NA 3 NA NA NA NA 

which will leave the missing values ​​as NA. if you want to set them to 0 you can do

 m[is.na(m)]<-0 m # Source: local data frame [5 x 7] # # Apple Banana Grape Orange Cantaloupe Kiwi Strawberry # 1 3 1 1 1 0 0 0 # 2 0 0 4 0 0 0 0 # 3 1 1 0 0 1 1 0 # 4 0 0 0 0 0 0 1 # 5 0 0 3 0 0 0 0 
+5
source

You can melt β€œlist” and change the form β€œlong” to β€œwide” using dcast

 library(reshape2) dcast(melt(setNames(FruitBasketList, ls(pattern='^basket'))), L1~value) # L1 Apple Banana Grape Orange Cantaloupe Kiwi Strawberry #1 basket1 3 1 1 1 0 0 0 #2 basket2 0 0 4 0 0 0 0 #3 basket3 1 1 0 0 1 1 0 #4 basket4 0 0 0 0 0 0 1 #5 basket5 0 0 3 0 0 0 0 

Or using the base R functions of stack and table

 df <- stack(setNames(FruitBasketList, ls(pattern='^basket'))) table(df[2:1]) # values #ind Apple Banana Cantaloupe Grape Kiwi Orange Strawberry # basket1 3 1 0 1 0 1 0 # basket2 0 0 0 4 0 0 0 # basket3 1 1 1 0 1 0 0 # basket4 0 0 0 0 0 0 1 # basket5 0 0 0 3 0 0 0 
+1
source

you can apply the table function to each row, then translate the result with gtools::smartbind

0
source

Source: https://habr.com/ru/post/1212412/


All Articles