What is the best way to collapse two factors with NA into one variable

Question

What is the best way to collapse two factors with NA into one variable

I have many sets of variables like this:

Var1 Var2 "Asian" NA NA "Black" "White" NA

I would like to conveniently get them in this form:

  Race "Asian" "Black" "White"

I am trying something like:

 Race <- ifelse(is.na(Var1), Var2, Var1)

But this converts the values into numbers for the levels, and the numbers do not match (for example, giving 1, 1, 2 ). Is there a convenient way to do this (ideally with short, clear code)? (You can get out of this with as.character , but there should be a better way.)

+5

r

gung Jan 12 '15 at 1:22

source share

3 answers

How about this solution?

 ind <- apply(df, 1, function(x) which(!is.na(x))) df[cbind(seq_along(ind), ind)] [1] "Asian" "Black" "White"

+2

Datamine r Jan 12 '15 at 1:30

source share

Another solution (rather strange, I agree and quite briefly, your columns should be characters, as it seems in your example):

 > library(tidyr) > unite(replace(df, is.na(df), ""), V, c(Var1, Var2), sep=''))$V #[1] "Asian" "Black" "White"

Or it may be risky to use gsub, but here NA is an integral part of the character string:

 > gsub("NA", "", unite(df, V, c(Var1, Var2), sep='')$V) #[1] "Asian" "Black" "White"

+1

Colonel beauvel Jan 12 '15 at 21:57

source share

thelatemail · Accepted Answer · 2015-01-12T01:35:27+0000

With intermediate conversion via as.character :
Assuming this is your data:

 dat <- data.frame(Var1=c("Asian",NA,"White"),Var2=c(NA,"Black",NA)) do.call(pmax,c(lapply(dat,as.character),na.rm=TRUE)) #[1] "Asian" "Black" "White"

If you need to work with a specific subset, you can do:

 do.call(pmax,c(lapply(dat[c("Var1","Var2")],as.character),na.rm=TRUE))

An alternative not requiring as.character would be:

 dat[cbind(1:nrow(dat),max.col(!is.na(dat)))] #[1] "Asian" "Black" "White"

What is the best way to collapse two factors with NA into one variable

More articles: