Using dplyr rename (), including variable names not in the dataset

I am trying to translate some plyr code into dplyr and get stuck on the new rename () functions in dplyr. I would like to be able to reuse a single rename () expression for a set of datasets with overlapping but not identical original names. For example,

sample1 <- data.frame(A=1:10, B=letters[1:10]) sample2 <- data.frame(B=11:20, C=letters[11:20]) 

And then,

  rename(sample1, var1 = A, var2 = B, var3 = C) 

I would like the result to be such that the variable A is renamed to var1 and B is renamed to var2, rather than adding var3 in this case. Instead i get

Error: Unknown variables: C.

In contrast, plyr syntax will allow me to use

 rename(sample1, c("A" = "var1", "B" = "var2", "C" = "var3")) rename(sample2, c("A" = "var1", "B" = "var2", "C" = "var3")) 

and do not give an error. Is there a way to get the same result in dplyr without getting Unknown variables error?

+7
r dplyr plyr
source share
4 answers

Completely ignoring your actual query on how to do this with dplyr, I would suggest a different approach using the lookup table:

 sample1 <- data.frame(A=1:10, B=letters[1:10]) sample2 <- data.frame(B=11:20, C=letters[11:20]) rename_map <- c("A"="var1", "B"="var2", "C"="var3") names(sample1) <- rename_map[names(sample1)] str(sample1) names(sample2) <- rename_map[names(sample2)] str(sample2) 

Basically the algorithm is simple:

  • Create a lookup table for the names of the current variables for the desired names
  • Using the names () function, search the map with matching indices and assign the mapped variables to the appropriate columns.

EDIT: According to Hadley's suggestion, I used the named vector instead of a list, which makes life much easier. I always forget about the named vectors :(

+4
source share
  #no need to use rename oldnames<-unique(c(names(sample1),names(sample2))) newnames<-c("var1","var2","var3") name_df<-data.frame(oldnames,newnames) mydata<-list(sample1,sample2) # combined two datasets as a list #one liner finaldata <- lapply(mydata, function(i) {colnames(i)<-name_df[name_df[,1] %in% colnames(i),2] return(i)}) > finaldata [[1]] var1 var2 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e 6 6 f 7 7 g 8 8 h 9 9 i 10 10 j [[2]] var2 var3 1 11 k 2 12 l 3 13 m 4 14 n 5 15 o 6 16 p 7 17 q 8 18 r 9 19 s 10 20 t 
+1
source share

I used @earino's answer in front of me, but found that it might be unsafe. If the name (s) of the specified vector does not contain the column names of the data frame, these column names are silently replaced by NA and this is certainly not what you want.

 d1 <- data.frame(A = 1:10, B = letters[1:10], stringsAsFactors = FALSE) rename_vec <- c("B" = "var2", "C" = "var3") names(d1) <- rename_vec[names(d1)] str(d1) #> 'data.frame': 10 obs. of 2 variables: #> $ NA : int 1 2 3 4 5 6 7 8 9 10 #> $ var2: chr "a" "b" "c" "d" ... 

The same thing can happen if you run names(d1) <- rename_vec[names(d1)] twice, because when you run it a second time, none of colnames(d1) has names(rename_vec) .

 names(d1) <- rename_vec[names(d1)] str(d1) #> 'data.frame': 10 obs. of 2 variables: #> $ NA: int 1 2 3 4 5 6 7 8 9 10 #> $ NA: chr "a" "b" "c" "d" ... 

A safer way would be to replace the string with column names, for example str_replace_all() from the package {stringr}.

We just need to select the columns that are in the data frame and in the renaming vector.

 d2 <- data.frame(B1 = 1:10, B = letters[1:10], stringsAsFactors = FALSE) sel <- is.element(colnames(d2), names(rename_vec)) names(d2)[sel] <- rename_vec[names(d2)][sel] str(d2) #> 'data.frame': 10 obs. of 2 variables: #> $ B1 : int 1 2 3 4 5 6 7 8 9 10 #> $ var2: chr "a" "b" "c" "d" ... 

UPDATE: I initially had a solution that included replacing the string, which also turned out to be unsafe, because it allowed a partial match. I think this is better.

0
source share

With dplyr we can use a named vector with old names as values ​​and new names as names, and then only values ​​in name_vec that match the names in your dataset are invalid. rename supports fuzzy characters, so there is no need to pre-convert them to sym :

 library(dplyr) name_vec <- c(var1 = "A", var2 = "B", var3 = "C") sample1 %>% rename(!!name_vec[name_vec %in% names(.)]) sample2 %>% rename(!!name_vec[name_vec %in% names(.)]) 

Also using setNames :

 name_vec <- c(A = "var1", B = "var2", C = "var3") sample1 %>% setNames(name_vec[names(.)]) sample2 %>% setNames(name_vec[names(.)]) 

Exit:

  var1 var2 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e 6 6 f 7 7 g 8 8 h 9 9 i 10 10 j var2 var3 1 11 k 2 12 l 3 13 m 4 14 n 5 15 o 6 16 p 7 17 q 8 18 r 9 19 s 10 20 t 
0
source share

All Articles