Executing an if statement on every line in R

I read in a CSV file in R that looks like this:

3,3 3,2 3,3 3,3 3,3 3,3 2,3 1,2 2,2 3,3 

I want to assign a number to each of 9 unique features that can be my data (3 and 3 - 9, 3 and 2 - 8, 2 and 3 - 6, etc.). I am trying to create a nested if statement that will evaluate each row, assign a number in the third column and do this for every row in the dataset. I believe that this can be done using the apply function, but it's hard for me to get the if statement to work within the framework of the apply function. Both columns have possible values โ€‹โ€‹of 1,2 or 3. This is my code so far, just trying to assign from 9 to 3/3 columns and 0 to everything else:

 #RScript for haplotype analysis #remove(list=ls()) options(stringsAsFactors=FALSE) setwd("C:/Documents and Settings/ColumbiaPC/Desktop") #read in comma-delimited, ID-matched genotype data OXT <- read.csv("OXTRhaplotype.csv") colnames(OXT)<- c("OXT1","OXT2") OXT$HAP <- apply(OXT, 1, function(x) if(x[1]=="3"&&x[2]=="3")x[3]=="9" else 0)) 

Thanks for any help in advance.

+4
source share
4 answers

You can solve the problem you are describing using a matrix and a standard subset of R, without any if

 m <- matrix(1:9, nrow=3, byrow=TRUE) m [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 

This means that you can index m using a subset of the matrix:

 m[3, 2] [1] 8 m[3,3] [1] 9 m[2,3] [1] 6 

And now you can apply this to your data:

 df <- structure(list(V1 = c(3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 2L, 3L), V2 = c(3L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -10L)) #df$m <- sapply(seq_len(nrow(df)), function(i)m[df$V1[i], df$V2[i]]) df$m <- m[as.matrix(df)] # Use matrix subsetting, suggested by @Aaron df V1 V2 m 1 3 3 9 2 3 2 8 3 3 3 9 4 3 3 9 5 3 3 9 6 3 3 9 7 2 3 6 8 1 2 2 9 2 2 5 10 3 3 9 
+11
source

Unfortunately, I came late and with a solution similar to @Andrie one, like this:

 dat <- matrix(c(3,3,3,2,3,3,3,3,3,3,3,3,2,3,1,2,2,2,3,3), nr=10, byrow=TRUE) # here is our lookup table for genotypes pat <- matrix(1:9, nr=3, byrow=T, dimnames=list(1:3,1:3)) 

Then

 > pat[dat] [1] 9 8 9 9 9 9 6 2 5 9 

gives you what you want.

However, I would like to say that it might be easier for you to use a dedicated package for genetic research, for example, one found on CRAN (for example, genetics , gap or SNPassoc , to name a few) or Bioconductor , since they include conversion / transcode tools genotype data and work with a haplotype.

Here is an example of what I mean with the above remark:

 > library(genetics) > geno1 <- as.genotype.allele.count(dat[,1]-1) > geno2 <- as.genotype.allele.count(dat[,2]-1) > table(geno1, geno2) geno2 geno1 A/AA/B A/A 6 1 A/B 1 1 B/B 0 1 
+5
source

Andri has already answered your question, showing the best approach to your problem. But there are a few bugs in your source code that I want to mention.

First, & does not match && . See ?'&' For more details. I believe that you wanted to use & in your example.

Secondly, == used for equality tests, which you use correctly in your example. It is not used for assignment, which you incorrectly use for when assigning "9" to x[3] . The destination is processed <- , whether inside or outside. See ?'==' and ?'<-' more details.

Third, assigning the value of x[3] in the apply() function does not make sense. apply() just returns an array. It does not modify the OXT object. The following is an example of what your original approach might look like. However, the Andrie method is probably best for you.

 OXT <- read.table(textConnection( "3 3 3 2 3 3 3 3 3 3 3 3 2 3 1 2 2 2 3 3")) colnames(OXT)<- c("OXT1","OXT2") OXT$HAP <- apply(OXT, 1, function(x) { if(x[1] == 3 & x[2] == 3) result <- 9 else if(x[1] == 3 & x[2] == 2) result <- 8 else if(x[1] == 3 & x[2] == 1) result <- 7 else result <- 0 return(result) }) 
+5
source

Another approach is to insert two columns together and make a factor.

 df <- structure(list(V1 = c(3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 2L, 3L), V2 = c(3L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 3L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -10L)) df$hap <- factor(paste(df$V1, df$V2, sep="")) 

Or, the same thing

 df$hap2 <- factor(apply(df[1:2], 1, paste, collapse="")) 
+3
source

All Articles