Here are a few approaches:
movies <- data.frame(genre_list = I(list( c("drama", "action", "romance"), c("crime", "drama"), c("crime", "drama", "mystery"), c("thriller", "indie"), c("thriller"), c("drama", "family"))))
Update, years later ....
You can use the mtabulate function from "qdapTools" or from the unexported charMat function from my "splitstackshape" package.
Syntax:
library(qdapTools) mtabulate(movies$genre_list)
or
splitstackshape:::charMat(movies$genre_list, fill = 0)
Update: Some More Direct Approaches
Improved option 1 : use table somewhat directly:
table(rep(1:nrow(movies), sapply(movies$genre_list, length)), unlist(movies$genre_list, use.names=FALSE))
Improved option 2 . Use the for loop.
x <- unique(unlist(movies$genre_list, use.names=FALSE)) m <- matrix(0, ncol = length(x), nrow = nrow(movies), dimnames = list(NULL, x)) for (i in 1:nrow(m)) { m[i, movies$genre_list[[i]]] <- 1 } m
Below is the OLD answer
Convert the list to a table list (in turn converted to data.frame s):
tables <- lapply(seq_along(movies$genre_list), function(x) { temp <- as.data.frame.table(table(movies$genre_list[[x]])) names(temp) <- c("Genre", paste("Record", x, sep = "_")) temp })
Use the Reduce to merge result list. If I understand your final goal correctly, this will lead to a transposed form of the result you are interested in.
merged_tables <- Reduce(function(x, y) merge(x, y, all = TRUE), tables) merged_tables # Genre Record_1 Record_2 Record_3 Record_4 Record_5 Record_6 # 1 action 1 NA NA NA NA NA # 2 drama 1 1 1 NA NA 1 # 3 romance 1 NA NA NA NA NA # 4 crime NA 1 1 NA NA NA # 5 mystery NA NA 1 NA NA NA # 6 indie NA NA NA 1 NA NA # 7 thriller NA NA NA 1 1 NA # 8 family NA NA NA NA NA 1
Transposing and converting NA to 0 quite simple. Just leave the first column and reuse it as the names column for the new data.frame
movie_genres <- setNames(data.frame(t(merged_tables[-1])), merged_tables[[1]]) movie_genres[is.na(movie_genres)] <- 0 movie_genres