Mutating columns of a data frame based on a predicate function (dplyr :: mutate_if)

Question

Mutating columns of a data frame based on a predicate function (dplyr :: mutate_if)

I would like to use the dplyr mutate_if() function to convert column-columns to column-frames of data, but when I tried to do this, I ran into a cryptic error. I am using dplyr 0.5.0, purrr 0.2.2, R 3.3.0.

The main setup looks like this: I have a data frame d , some of whose columns are lists:

 d <- dplyr::data_frame( A = list( list(list(x = "a", y = 1), list(x = "b", y = 2)), list(list(x = "c", y = 3), list(x = "d", y = 4)) ), B = LETTERS[1:2] )

I would like to convert a column of lists (in this case, d$A ) to a column of data frames using the following function:

 tblfy <- function(x) { x %>% purrr::transpose() %>% purrr::simplify_all() %>% dplyr::as_data_frame() }

That is, I would like list-column d$A be replaced by lapply(d$A, tblfy) , which is

 [[1]] # A tibble: 2 x 2 xy <chr> <dbl> 1 a 1 2 b 2 [[2]] # A tibble: 2 x 2 xy <chr> <dbl> 1 c 3 2 d 4

Of course, in this simple case, I could just do a simple reassignment. The thing is, I would like to do this programmatically, ideally with dplyr, in a generally accepted way that could handle any number of columns in a list.

Here's where I stumble: when I try to convert column-columns to column-data-columns using the following application

 d %>% dplyr::mutate_if(is.list, funs(tblfy))

I get an error that I don’t know how to interpret:

 Error: Each variable must be named. Problem variables: 1, 2

Why mutate_if() not work? How to apply it to get the desired result?

Note

The comment noted that the tblfy() function should be vectorized. This is a reasonable suggestion. But - if I incorrectly vectorized - this does not seem to fall into the root of the problem. tblfy() vectorized version of tblfy() ,

 tblfy_vec <- Vectorize(tblfy)

mutate_if() fails with an error

 Error: wrong result size (4), expected 2 or 1

Update

Having gained some experience with purrr, I now find the following approach natural, if somewhat long:

 d %>% map_if(is.list, ~ map(., ~ map_df(., identity))) %>% as_data_frame()

This is more or less identical to @alistaire's solution below, but uses map_if() , respectively. map() , instead of mutate_if() , resp. Vectorize() .

+5

r dplyr purrr

egnha Jul 07 '16 at 18:08

source share

2 answers

In-place conversion without copying:

 library(data.table) for (col in d) if (is.list(col)) lapply(col, setDF) d #Source: local data frame [2 x 2] # # AB #1 <S3:data.frame> A #2 <S3:data.frame> B

+6

eddi Jul 07 '16 at 20:00

source share

alistaire · Accepted Answer · 2016-07-07T20:03:30+0000

The original tblfy function does not work for me (even if its elements are directly linked), so let's rebuild it a bit by adding a vector that allows us to avoid the previously unnecessary previous rowwise() call:

 tblfy <- Vectorize(function(x){x %>% purrr::map_df(identity) %>% list()})

Now we can use mutate_if nicely:

 d %>% mutate_if(purrr::is_list, tblfy) ## Source: local data frame [2 x 2] ## ## AB ## <list> <chr> ## 1 <tbl_df [2,2]> A ## 2 <tbl_df [2,2]> B

... and if we don’t notice what is there,

 d %>% mutate_if(purrr::is_list, tblfy) %>% tidyr::unnest() ## Source: local data frame [4 x 3] ## ## B xy ## <chr> <chr> <dbl> ## 1 A a 1 ## 2 A b 2 ## 3 B c 3 ## 4 B d 4

A few notes:

map_df(identity) seems to be more effective at creating a slice than any alternative wording. I know that calling identity seems unnecessary, but most of the rest will break.
I'm not sure how widely tblfy will be useful, since it depends somewhat on the structure of the lists in the list column, which can vary greatly. If you have a lot with a similar structure, I suggest this is useful.
There may be a way to do this using pmap instead of Vectorize , but I can't get it to work with some quick attempts.

Mutating columns of a data frame based on a predicate function (dplyr :: mutate_if)

More articles: