I have a question regarding the use of the split function to group data using factor .
I have a data frame from two columns of snps and a gene. Snps is a factor, gene is a symbol vector. I want to group genes using the snp parameter so that I can see the list of genes that map to each snp. Some snps can map to more than one gene, for example, rs10000226 matches the 345274 gene and the 5783 gene, and the genes occur several times.
To do this, I used the split function to make a list of genes, each of which is attached to snp.
snps<-c("rs10000185", "rs1000022", "rs10000226", "rs10000226") gene<-c("5783", "171425", "345274", "5783") df<-data.frame(snps, gene)
However, this is not effective for my complete data frame (probably because of its size - 363422 lines, 281370 unique snps, 20888 unique genes) and R crashes when trying to load df.2.rda` later.
Any suggestions on alternative ways to do this would be much appreciated!
avari source share