I am trying to find a way to collapse rows with intersecting ranges indicated by the "start" and "stop" columns, and write the collapsed values ββto the new columns. For example, I have this data frame:
my.df<- data.frame(chrom=c(1,1,1,1,14,16,16), name=c("a","b","c","d","e","f","g"), start=as.numeric(c(0,70001,70203,70060, 40004, 50000872, 50000872)), stop=as.numeric(c(71200,71200,80001,71051, 42004, 50000890, 51000952))) chrom name start stop 1 a 0 71200 1 b 70001 71200 1 c 70203 80001 1 d 70060 71051 14 e 40004 42004 16 f 50000872 50000890 16 g 50000872 51000952
And I'm trying to find overlapping ranges and write the largest range covered by collapsed overlapping lines in "start" and "stop" and the names of the collapsed lines, so I would get the following:
chrom start stop name 1 70001 80001 a,b,c,d 14 40004 42004 e 16 50000872 51000952 f,g
I think I could use IRanges packages as follows:
library(IRanges) ranges <- split(IRanges(my.df$start, my.df$stop), my.df$chrom)
But then I have problems getting collapsed columns: I tried with findOvarlaps, but this
ov <- findOverlaps(ranges, ranges, type="any")
but I donβt think it is right.
Any help would be greatly appreciated.
Thanks! -fra