Discard all list items in the data frame

Question

Discard all list items in the data frame

I have a data frame with the following classes of variables for each column:

"date" "numeric" "numeric" "list" "list" "numeric"

The data in each row is as follows:

  1978-01-01, 12.5, 6.3, c(0,0,0.25,0.45,0.3), c(0,0,0,0.1,0.9), 72

I would like to convert it to a matrix or data frame with one value for each column, so the result should look like this:

 1978-01-01, 12.5, 6.3, 0, 0, 0.25, 0.45, 0.3, 0, 0, 0, 0.1, 0.9, 72

I tried using:

 j<-unlist(input) output<-matrix(j,nrow=nrow(input),ncol=length(j)/nrow(input))

But he messed up the order of the lines in the output.

Any idea?

Additional Information:

The above example is a bit simplified and dput(head(input)) returns the following example:

 structure(list(DATE = structure(c(2924, 2925, 2926, 2927, 2928, 2929), class = "Date"), TEMP_MEAN_M0 = c(-7.625, -7.375, -6, -5.5, -7.625, -9.625), SLP_MEAN_M0 = c(1012.125, 991.975, 989.825, 986.675, 988.95, 993.075), WIND_DIR_RF_M0 = structure(list(`2.counts` = c(0, 0.625, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0, 0, 0, 0.125), `3.counts` = c(0.75, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.375, 0.125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.125, 0, 0, 0), `5.counts` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0, 0, 0.125, 0.375, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `6.counts` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0, 0.25, 0.125, 0.25, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `7.counts` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.5, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("2.counts", "3.counts", "4.counts", "5.counts", "6.counts", "7.counts")), CEIL_HGT_RF_M0 = structure(list(`2.counts` = c(0.625, 0, 0, 0, 0, 0, 0, 0, 0, 0.375), `3.counts` = c(0.75, 0.125, 0, 0.125, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.25, 0.125, 0, 0.125, 0, 0, 0, 0, 0.25, 0.25), `5.counts` = c(0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.875), `6.counts` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1), `7.counts` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1)), .Names = c("2.counts", "3.counts", "4.counts", "5.counts", "6.counts", "7.counts")), WIND_SPD_MEAN_M0 = c(12.8125, 18.7375, 6.175, 8.175, 10.5375, 16.5375)), .Names = c("DATE", "TEMP_MEAN_M0", "SLP_MEAN_M0", "WIND_DIR_RF_M0", "CEIL_HGT_RF_M0", "WIND_SPD_MEAN_M0" ), row.names = c(NA, 6L), class = "data.frame")

+8

r dataframe

jatobat Apr 10 '13 at 16:02

source share

1 answer

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2013-04-10T17:03:40+0000

This is a bit dirty and probably pretty inefficient, but should help you get started:

Here are some sample data:

 mydf <- data.frame(Date = as.Date(c("1978-01-01", "1978-01-02")), V1 = c(10, 10), V2 = c(11, 11)) mydf$V3 <- list(c(1:10), c(11:20)) mydf$V4 <- list(c(21:25), c(26:30)) mydf # Date V1 V2 V3 V4 # 1 1978-01-01 10 11 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 21, 22, 23, 24, 25 # 2 1978-01-02 10 11 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 26, 27, 28, 29, 30

And a small function that checks which columns are lists, and for these columns, rbind them together, and end up cbind them with columns that are not lists.

 myFun <- function(data) { temp1 <- sapply(data, is.list) temp2 <- do.call( cbind, lapply(data[temp1], function(x) data.frame(do.call(rbind, x), check.names=FALSE))) cbind(data[!temp1], temp2) } myFun(mydf) # Date V1 V2 V3.1 V3.2 V3.3 V3.4 V3.5 V3.6 V3.7 V3.8 V3.9 V3.10 V4.1 # 1 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 # 2 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 # V4.2 V4.3 V4.4 V4.5 # 1 22 23 24 25 # 2 27 28 29 30

This will only work if each column contains vectors of the same length (otherwise the R rbind base will not work).

Update

Repeating this question after half a day, I see that another user ( @ user1981275 ) posted a solution that was simpler, but then deleted their answer. They may have been deleted because their method converted dates to integers, because, as DWin pointed out, the elements in the matrices must be in the same mode.

Here is their solution:

 t(apply(mydf, 1, unlist)) # Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45 # [1,] 2922 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25 # [2,] 2923 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30

Here it can easily be changed to obtain the desired result. This will definitely be faster than the previous approach:

 cbind(mydf[!sapply(mydf, is.list)], (t(apply(mydf[sapply(mydf, is.list)], 1, unlist)))) # Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45 # 1 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25 # 2 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30

Or as a custom function:

 myFun <- function(data) { ListCols <- sapply(data, is.list) cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist))) } myFun(mydf)

Update 2

I also wrote a more efficient function called col_flatten part of my SOfun package.

Install the package using:

 source("http://news.mrdwab.com/install_github.R") install_github("mrdwab/SOfun")

Then you can do:

 library(SOfun) col_flatten(mydf, names(which(sapply(mydf, is.list))), drop = TRUE) ## Date V1 V2 V3_1 V3_2 V3_3 V3_4 V3_5 V3_6 V3_7 V3_8 V3_9 V3_10 V4_1 V4_2 V4_3 V4_4 V4_5 ## 1: 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25 ## 2: 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30

It is based on the transpose function in "data.table", so be sure to get "data.table".

Discard all list items in the data frame

Update

Update 2

More articles: