Mutating columns of a data frame based on a predicate function (dplyr :: mutate_if)

I would like to use the dplyr mutate_if() function to convert column-columns to column-frames of data, but when I tried to do this, I ran into a cryptic error. I am using dplyr 0.5.0, purrr 0.2.2, R 3.3.0.

The main setup looks like this: I have a data frame d , some of whose columns are lists:

 d <- dplyr::data_frame( A = list( list(list(x = "a", y = 1), list(x = "b", y = 2)), list(list(x = "c", y = 3), list(x = "d", y = 4)) ), B = LETTERS[1:2] ) 

I would like to convert a column of lists (in this case, d$A ) to a column of data frames using the following function:

 tblfy <- function(x) { x %>% purrr::transpose() %>% purrr::simplify_all() %>% dplyr::as_data_frame() } 

That is, I would like list-column d$A be replaced by lapply(d$A, tblfy) , which is

 [[1]] # A tibble: 2 x 2 xy <chr> <dbl> 1 a 1 2 b 2 [[2]] # A tibble: 2 x 2 xy <chr> <dbl> 1 c 3 2 d 4 

Of course, in this simple case, I could just do a simple reassignment. The thing is, I would like to do this programmatically, ideally with dplyr, in a generally accepted way that could handle any number of columns in a list.

Here's where I stumble: when I try to convert column-columns to column-data-columns using the following application

 d %>% dplyr::mutate_if(is.list, funs(tblfy)) 

I get an error that I don’t know how to interpret:

 Error: Each variable must be named. Problem variables: 1, 2 

Why mutate_if() not work? How to apply it to get the desired result?

Note

The comment noted that the tblfy() function should be vectorized. This is a reasonable suggestion. But - if I incorrectly vectorized - this does not seem to fall into the root of the problem. tblfy() vectorized version of tblfy() ,

 tblfy_vec <- Vectorize(tblfy) 

mutate_if() fails with an error

 Error: wrong result size (4), expected 2 or 1 

Update

Having gained some experience with purrr, I now find the following approach natural, if somewhat long:

 d %>% map_if(is.list, ~ map(., ~ map_df(., identity))) %>% as_data_frame() 

This is more or less identical to @alistaire's solution below, but uses map_if() , respectively. map() , instead of mutate_if() , resp. Vectorize() .

+5
source share
2 answers

The original tblfy function does not work for me (even if its elements are directly linked), so let's rebuild it a bit by adding a vector that allows us to avoid the previously unnecessary previous rowwise() call:

 tblfy <- Vectorize(function(x){x %>% purrr::map_df(identity) %>% list()}) 

Now we can use mutate_if nicely:

 d %>% mutate_if(purrr::is_list, tblfy) ## Source: local data frame [2 x 2] ## ## AB ## <list> <chr> ## 1 <tbl_df [2,2]> A ## 2 <tbl_df [2,2]> B 

... and if we don’t notice what is there,

 d %>% mutate_if(purrr::is_list, tblfy) %>% tidyr::unnest() ## Source: local data frame [4 x 3] ## ## B xy ## <chr> <chr> <dbl> ## 1 A a 1 ## 2 A b 2 ## 3 B c 3 ## 4 B d 4 

A few notes:

  • map_df(identity) seems to be more effective at creating a slice than any alternative wording. I know that calling identity seems unnecessary, but most of the rest will break.
  • I'm not sure how widely tblfy will be useful, since it depends somewhat on the structure of the lists in the list column, which can vary greatly. If you have a lot with a similar structure, I suggest this is useful.
  • There may be a way to do this using pmap instead of Vectorize , but I can't get it to work with some quick attempts.
+5
source

In-place conversion without copying:

 library(data.table) for (col in d) if (is.list(col)) lapply(col, setDF) d #Source: local data frame [2 x 2] # # AB #1 <S3:data.frame> A #2 <S3:data.frame> B 
+6
source

All Articles