Removing columns based on row value

Question

Removing columns based on row value

Given two data frames:

C1<-c(3,4,4,4,5) C2<-c(3,7,3,4,5) C3<-c(5,6,3,7,4) DF<-data.frame(C1=C1,C2=C2,C3=C3) DF C1 C2 C3 1 3 3 5 2 4 7 6 3 4 3 3 4 4 4 7 5 5 5 4

and

 V1<-c(3,2,2,4,5) V2<-c(3,7,3,5,2) V3<-c(5,2,5,7,5) V4<-c(1,1,2,3,4) V5<-c(1,2,6,7,5) DF2<-data.frame(V1=V1,V2=V2,V3=V3,V4=V4,V5=V5) DF2 V1 V2 V3 V4 V5 1 3 3 5 1 1 2 2 7 2 1 2 3 2 3 5 2 6 4 4 5 7 3 7 5 5 2 5 4 5

Looking at each equivalent row in both data frames, there is a relationship between the value in C3 and the number of columns that I want to delete in the same row in DF2.

The relationship between the value in C3 and # columns in DF2 for deletion is as follows:

 If C3≥7 drop V5 If C3=6.0:6.9 drop V4 and up (so basically V5,V4) If C3=5.0:5.9 drop V3 and up (so basically V5,V4,V3) If C3=4.0:4.9 drop V2 and up (so basically V5,V4,V3,V2) If C3≤3.9 drop entire row

In this example, based on the values of C3, I would like DF2 to look like this:

  V1 V2 V3 V4 V5 1 3 3 2 2 7 2 4 4 5 7 3 5 5

I tried writing a simple script to do this (I'm pretty new, so I like to keep everything in order so that I can see what is happening), but I throw errors on the left and on the right so that I appreciate some tips on how to proceed

+7

r dataframe row

Vinterwoo Jun 15 '12 at 3:01

source share

3 answers

Perhaps the easiest way:

 DF3 <- DF2 for (i in seq_len(nrow(DF3))) { DF3[i, seq_len(ncol(DF3)) >= DF[i, ]$C3 - 2] <- NA } DF3

then

 > DF3 V1 V2 V3 V4 V5 1 3 3 NA NA NA 2 2 7 2 NA NA 3 NA NA NA NA NA 4 4 5 7 3 NA 5 5 NA NA NA NA

+4

kohske Jun 15 '12 at 3:46

source share

A slight deviation from the Kokhsky answer using certain cut points:

 breaksx <- cut(DF$C3,c(0,3,4,5,6,7,Inf),labels=FALSE) for (i in seq(nrow(DF2))) { DF2[i,breaksx[i]:ncol(DF2)] <- NA }

Result:

 > DF2 V1 V2 V3 V4 V5 1 3 3 NA NA NA 2 2 7 2 NA NA 3 NA NA NA NA NA 4 4 5 7 3 NA 5 5 NA NA NA NA

To delete rows that are all NA

 DF2[apply(DF2,1,function(x) !all(is.na(x))),]

Result:

  V1 V2 V3 V4 V5 1 3 3 NA NA NA 2 2 7 2 NA NA 4 4 5 7 3 NA 5 5 NA NA NA NA

+2

thelatemail Jun 15 '12 at 4:26

source share

Chase · Accepted Answer · 2012-06-15T03:54:34+0000

I like the answer to Cat, but if your rules for installing on NA do not have a good mathematical property for them or you need to define your rules arbitrarily, this approach should give you such flexibility. First, define a function that returns the columns to be discarded based on your rules:

 f <- function(x) { if(x >= 7){ out <- 5 }else if(x >= 6.0){ out <- 4:5 } else if( x >= 5.0){ out <- 3:5 } else if (x >= 4.0){ out <- 2:5 } else { out <- 1:5 } return(out) }

Then create a list for column indices:

 z <- lapply(DF$C3, f)

Finally, skip each row by setting the corresponding columns to NA:

 for(j in seq(length(z))){ DF2[j, z[[j]]] <- NA } #----- V1 V2 V3 V4 V5 1 3 3 NA NA NA 2 2 7 2 NA NA 3 NA NA NA NA NA 4 4 5 7 3 NA 5 5 NA NA NA NA

Removing columns based on row value

More articles: