Dynamically select columns of a data frame using $ and a column name vector

I want to order a data frame based on different columns, one in line. I have a character vector with the corresponding column names on which order should be based:

 parameter <- c("market_value_LOCAL", "ep", "book_price", "sales_price", "dividend_yield", "beta", "TOTAL_RATING_SCORE", "ENVIRONMENT", "SOCIAL", "GOVERNANCE") 

I want to iterate over the names in parameter and dynamically select the column that will be used to order my data:

 Q1_R1000_parameter <- Q1_R1000[order(Q1_R1000$parameter[X]), ] 

where X is 1:10 (because I have 10 elements in parameter ).




To make my example reproducible, consider the mtcars and some variable names stored in the cols character vector. When I try to select a variable from mtcars using a dynamic subset of cols , similarly above ( Q1_R1000$parameter[X] ), the column is not selected:

 cols <- c("cyl", "am") mtcars$cols[1] # NULL 
+88
r r-faq dataframe
Aug 14 '13 at 2:27
source share
8 answers

You cannot do this subset with $ . In the source code ( R/src/main/subset.c ), it indicates:

/ * The $ subset operator.
We must necessarily evaluate only the first argument.
The second will be a symbol that must be matched, not evaluated.
* /

Second argument? What kind?! You must understand that $ , like everything else in R (including, for example, ( , + , ^ , etc.), is a function that takes arguments and evaluates. df$V1 can be rewritten as

 `$`(df , V1) 

or really

 `$`(df , "V1") 

But...

 `$`(df , paste0("V1") ) 

... for example, it will never work, and nothing else should be evaluated first in the second argument. You can only pass a string that is never evaluated.

Instead, use [ (or [[ if you want to extract only one column as a vector).

For example,

 var <- "mpg" #Doesn't work mtcars$var #These both work, but note that what they return is different # the first is a vector, the second is a data.frame mtcars[[var]] mtcars[var] 

You can do the ordering without loops using do.call to build an order call. The following is a reproducible example:

 # set seed for reproducibility set.seed(123) df <- data.frame( col1 = sample(5,10,repl=T) , col2 = sample(5,10,repl=T) , col3 = sample(5,10,repl=T) ) # We want to sort by 'col3' then by 'col1' sort_list <- c("col3","col1") # Use 'do.call' to call order. Seccond argument in do.call is a list of arguments # to pass to the first argument, in this case 'order'. # Since a data.frame is really a list, we just subset the data.frame # according to the columns we want to sort in, in that order df[ do.call( order , df[ , match( sort_list , names(df) ) ] ) , ] col1 col2 col3 10 3 5 1 9 3 2 2 7 3 2 3 8 5 1 3 6 1 5 4 3 3 4 4 2 4 3 4 5 5 1 4 1 2 5 5 4 5 3 5 
+146
Aug 14 '13 at 9:57 on
source share

Using dplyr provides simple syntax for sorting data frames.

 library(dplyr) mtcars %>% arrange(gear, desc(mpg)) 

It may be useful to use the NSE version as shown here to dynamically create a sort list

 sort_list <- c("gear", "desc(mpg)") mtcars %>% arrange_(.dots = sort_list) 
+3
Nov 15 '16 at 17:48
source share

If I understand correctly, you have a vector containing the names of the variables, and you need to scroll through each name and sort them. If so, this example should illustrate the solution for you. The main problem in yours (the full example is not complete, so I'm not sure what else you might be missing) is that it should be order(Q1_R1000[,parameter[X]]) instead of order(Q1_R1000$parameter[X]) , because the parameter is an external object that contains a variable name opposite to the direct column of your data frame (which would be appropriate when $ ).

 set.seed(1) dat <- data.frame(var1=round(rnorm(10)), var2=round(rnorm(10)), var3=round(rnorm(10))) param <- paste0("var",1:3) dat # var1 var2 var3 #1 -1 2 1 #2 0 0 1 #3 -1 -1 0 #4 2 -2 -2 #5 0 1 1 #6 -1 0 0 #7 0 0 0 #8 1 1 -1 #9 1 1 0 #10 0 1 0 for(p in rev(param)){ dat <- dat[order(dat[,p]),] } dat # var1 var2 var3 #3 -1 -1 0 #6 -1 0 0 #1 -1 2 1 #7 0 0 0 #2 0 0 1 #10 0 1 0 #5 0 1 1 #8 1 1 -1 #9 1 1 0 #4 2 -2 -2 
+2
Aug 14 '13 at 3:32
source share
 Q1_R1000[do.call(order, Q1_R1000[parameter]), ] 
0
Aug 14 '13 at 10:20
source share

There was a similar problem due to some CSV files that had different names for the same column.
This was the solution:

I wrote a function to return the first valid column name in the list, and then use this ...

 # Return the string name of the first name in names that is a column name in tbl # else null ChooseCorrectColumnName <- function(tbl, names) { for(n in names) { if (n %in% colnames(tbl)) { return(n) } } return(null) } then... cptcodefieldname = ChooseCorrectColumnName(file, c("CPT", "CPT.Code")) icdcodefieldname = ChooseCorrectColumnName(file, c("ICD.10.CM.Code", "ICD10.Code")) if (is.null(cptcodefieldname) || is.null(icdcodefieldname)) { print("Bad file column name") } # Here we use the hash table implementation where # we have a string key and list value so we need actual strings, # not Factors file[cptcodefieldname] = as.character(file[cptcodefieldname]) file[icdcodefieldname] = as.character(file[icdcodefieldname]) for (i in 1:length(file[cptcodefieldname])) { cpt_valid_icds[file[cptcodefieldname][i]] <<- unique(c(cpt_valid_icds[[file[cptcodefieldname][i]]], file[icdcodefieldname][i])) } 
0
Dec 13 '17 at 17:32
source share

if you want to select a column with a specific name then just do

 A=mtcars[,which(conames(mtcars)==cols[1])] #and then colnames(mtcars)[A]=cols[1] 

you can run it in a loop, and vice versa, to add a dynamic name, for example, if A is a data frame and xyz is a column that will be called as x, then I like it

 A$tmp=xyz colnames(A)[colnames(A)=="tmp"]=x 

again this can also be added to the loop

0
Jul 13 '18 at 8:15
source share

Another solution is to use #get:

 > cols <- c("cyl", "am") > get(cols[1], mtcars) [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4 
-one
Dec 19 '18 at 2:48
source share

too late .. but I think I have an answer -

Here is my example of learning .df dataframe -

  >study.df study sample collection_dt other_column 1 DS-111 ES768098 2019-01-21:04:00:30 <NA> 2 DS-111 ES768099 2018-12-20:08:00:30 some_value 3 DS-111 ES768100 <NA> some_value 

And then -

 > ## Selecting Columns in an Given order > ## Create ColNames vector as per your Preference > > selectCols <- c('study','collection_dt','sample') > > ## Select data from Study.df with help of selection vector > selectCols %>% select(.data=study.df,.) study collection_dt sample 1 DS-111 2019-01-21:04:00:30 ES768098 2 DS-111 2018-12-20:08:00:30 ES768099 3 DS-111 <NA> ES768100 > 
-one
Jan 22 '19 at 6:37
source share



All Articles