Subset variables in a data frame based on column type

I need to multiply the data frame based on the column type - for example, from a data frame with 100 columns, I need to save only those columns of type factor or integer . I wrote a short function for this, but is there a simpler solution or some kind of built-in function or package on CRAN?

My current solution for getting variable names with requested types:

 varlist <- function(df=NULL, vartypes=NULL) { type_function <- c("is.factor","is.integer","is.numeric","is.character","is.double","is.logical") names(type_function) <- c("factor","integer","numeric","character","double","logical") names(df)[as.logical(sapply(lapply(names(df), function(y) sapply(type_function[names(type_function) %in% vartypes], function(x) do.call(x,list(df[[y]])))),sum))] } 

The varlist function works as follows:

  • For each requested type and for each column in the data frame, the "is.TYPE" function is called
  • Summarizes tests for each variable (boolean is set automatically for integers)
  • The result of the conversion to a logical vector
  • subset names in the data frame

And some data to verify it:

 df <- read.table(file="http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data", sep=" ", header=FALSE, stringsAsFactors=TRUE) names(df) <- c('ca_status','duration','credit_history','purpose','credit_amount','savings', 'present_employment_since','installment_rate_income','status_sex','other_debtors','present_residence_since','property','age','other_installment','housing','existing_credits', 'job','liable_maintenance_people','telephone','foreign_worker','gb') df$gb <- ifelse(df$gb == 2, FALSE, TRUE) df$property <- as.character(df$property) varlist(df, c("integer","logical")) 

I ask because my code looks very mysterious and hard to understand (even for me, and I finished the function 10 minutes ago).

+7
r
source share
2 answers
 subset_colclasses <- function(DF, colclasses="numeric") { DF[,sapply(DF, function(vec, test) class(vec) %in% test, test=colclasses)] } str(subset_colclasses(df, c("factor", "integer"))) 
+2
source share

Just follow these steps:

 df[,sapply(df,is.factor) | sapply(df,is.integer)] 
+13
source share

All Articles