I have a huge dataset with genotypic information from different populations. I would like to sort the data by population, but I do not know how to do it.
I want to sort by "pedigree_dhl". I used the following code, but I kept getting error messages.
newdata <- project[pedigree_dhl == CCB133$*1, ]
My problem also is that the "dhl pedigree" contains all the names of the individual genotypes. Only the first 7 letters in the dhl pedigree column are the name of the population. In this example: CCB133. How can I tell R that I want to extract data for all columns containing CCB133?
Allele1 Allele2 SNP_name gs_entry pedigree_dhl
1 T T ZM011407_0151 656 CCB133$*1
2 T T ZM009374_0354 656 CCB133$*1
3 C C ZM003499_0591 656 CCB133$*1
4 A A ZM003898_0594 656 CCB133$*1
5 C C ZM004887_0313 656 CCB133$*1
6 G G ZM000583_1096 656 CCB133$*1
marie source
share