Combining 2 columns into 1 column many times in a very large dataset in R
The awkward decisions I'm working on won't be very fast if I can get them to work, and the real data set is ~ 1500 X 45000, so they should be fast. I definitely lose 1) at the moment, although I have code for 2) and 3).
Here is an example of a toy data structure:
pop = data.frame(status = rbinom(n, 1, .42), sex = rbinom(n, 1, .5),
age = round(rnorm(n, mean=40, 10)), disType = rbinom(n, 1, .2),
rs123=c(1,3,1,3,3,1,1,1,3,1), rs123.1=rep(1, n), rs157=c(2,4,2,2,2,4,4,4,2,2),
rs157.1=c(4,4,4,2,4,4,4,4,2,2), rs132=c(4,4,4,4,4,4,4,4,2,2),
rs132.1=c(4,4,4,4,4,4,4,4,4,4))
Thus, there are several columns of basic demographic information, and then the remaining columns are dual-core SNP information. Example: rs123 is the rs123 allele 1 and rs123.1 is the second rs123 allele.
1) I need to combine all the two-dimensional SNP data that are currently in 2 columns into 1 column, so for example: rs123 and rs123.1 into one column (but inside the dataset):
11
31
11
31
31
11
11
11
31
11
2) I need to determine the least frequent SNP value (in the example above it is 31).
3) I need to replace the least frequent SNP value with 1, and the other (s) with 0.