Calculate the average value of a group with the same grouping factors several times

I have genetic data. It is quite large, with about 17,000 genetic markers (SNPs) and 700 people. These SNPs can be assigned to the founder. Now I want to calculate the average probability for the "founder segment". A segment is defined as part of a chromosome that is assigned to one founder without interruption.

In the example below, I will have 3 segments.
In the end, I want to know the average probability for all SNPs in the segment.

Chromosome SNP Founder Probability 1 1 7 0.6 1 2 7 0.5 1 3 7 0.7 1 4 2 0.5 1 5 2 0.8 1 6 7 0.6 1 7 7 0.5 

I easily group with dplyr , but I don't want the first segment of founder 7 along with another segment with founder 7.

So what I want:

 Chromosome SNP Founder Probability Average 1 1 7 0.6 0.6 1 2 7 0.5 0.6 1 3 7 0.7 0.6 1 4 2 0.5 0.65 1 5 2 0.8 0.65 1 6 7 0.6 0.55 1 7 7 0.5 0.55 

How can I calculate the average value of group I when several factors group several times?

+5
source share
1 answer

With dplyr we can compare the adjacent elements of the Founder to create a grouping variable along with the Chromosome, and then get mean "Probabilities"

 library(dplyr) library(data.table) df1 %>% group_by(Chromosome, grp1 = cumsum(Founder!=lag(Founder, default = Founder[n()]))) %>% mutate(Average = mean(Probability)) # Chromosome SNP Founder Probability grp1 Average # <int> <int> <int> <dbl> <int> <dbl> #1 1 1 7 0.6 0 0.60 #2 1 2 7 0.5 0 0.60 #3 1 3 7 0.7 0 0.60 #4 1 4 2 0.5 1 0.65 #5 1 5 2 0.8 1 0.65 #6 1 6 7 0.6 2 0.55 #7 1 7 7 0.5 2 0.55 

Or using data.table , we convert "data.frame" to "data.table" ( setDT(df1) ), grouped by "Chromome" and the "Length" line identifier ( rleid ) of the Founder, we assign ( := ) mean " Probabilities "like the" Medium "column.

 library(data.table) setDT(df1)[, Average := mean(Probability) , .(Chromosome, grp1 = rleid(Founder))] 
+4
source

All Articles