Subset variables in a data frame based on column type

Question

Subset variables in a data frame based on column type

I need to multiply the data frame based on the column type - for example, from a data frame with 100 columns, I need to save only those columns of type factor or integer . I wrote a short function for this, but is there a simpler solution or some kind of built-in function or package on CRAN?

My current solution for getting variable names with requested types:

 varlist <- function(df=NULL, vartypes=NULL) { type_function <- c("is.factor","is.integer","is.numeric","is.character","is.double","is.logical") names(type_function) <- c("factor","integer","numeric","character","double","logical") names(df)[as.logical(sapply(lapply(names(df), function(y) sapply(type_function[names(type_function) %in% vartypes], function(x) do.call(x,list(df[[y]])))),sum))] }

The varlist function works as follows:

For each requested type and for each column in the data frame, the "is.TYPE" function is called
Summarizes tests for each variable (boolean is set automatically for integers)
The result of the conversion to a logical vector
subset names in the data frame

And some data to verify it:

 df <- read.table(file="http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data", sep=" ", header=FALSE, stringsAsFactors=TRUE) names(df) <- c('ca_status','duration','credit_history','purpose','credit_amount','savings', 'present_employment_since','installment_rate_income','status_sex','other_debtors','present_residence_since','property','age','other_installment','housing','existing_credits', 'job','liable_maintenance_people','telephone','foreign_worker','gb') df$gb <- ifelse(df$gb == 2, FALSE, TRUE) df$property <- as.character(df$property) varlist(df, c("integer","logical"))

I ask because my code looks very mysterious and hard to understand (even for me, and I finished the function 10 minutes ago).

+7

r

Tomas greif Jul 31 '13 at 7:46

source share

2 answers

Just follow these steps:

 df[,sapply(df,is.factor) | sapply(df,is.integer)]

+13

Thomas Jul 31 '13 at 7:54

source share

Rolling · Accepted Answer · 2013-07-31T07:58:22+0000

 subset_colclasses <- function(DF, colclasses="numeric") { DF[,sapply(DF, function(vec, test) class(vec) %in% test, test=colclasses)] } str(subset_colclasses(df, c("factor", "integer")))

Subset variables in a data frame based on column type

More articles: