Find all ranges outside a specific range of ranges

I am wondering what would be the best way to define all ranges that are not covered by a given set of ranges. For example, if I have a set of genes with known coordinates:

dtGenes <- fread(
  "id,start,end
 1,1000,1300
 2,1200,1500
 3,1600,2600
 4,3000,4000
")

Say, I know that the total length of the chromosome (for simplicity, suppose that they are all on the same chromosome) is 10,000. So, finally, I expect to have the following list of intergenic regions:

"startR,endR
    0,1000
 1500,1600
 2600,3000
 4000,10000
"

Can Bioconductor IRangebe useful here? or is there another good way to solve this?

+4
source share
2 answers

Bioconductor GenomicRanges, GRanges

library(GenomicRanges)
gr <- with(dtGenes, GRanges("chr1", IRanges(start, end, names=id),
                            seqlengths=c(chr1=10000)))

gaps <- gaps(gr)

GRanges . GRanges, strand *. , "" +, - *, , *

> gaps[strand(gaps) == "*"]
GRanges with 4 ranges and 0 metadata columns:
      seqnames        ranges strand
         <Rle>     <IRanges>  <Rle>
  [1]     chr1 [   1,   999]      *
  [2]     chr1 [1501,  1599]      *
  [3]     chr1 [2601,  2999]      *
  [4]     chr1 [4001, 10000]      *
  ---
  seqlengths:
    chr1
   10000

Bioconductor , 1 - start end . shift narrow gr, Bioconductor. GRanges 10 .

+4

reduce IRanges package

x , .

library(IRanges)
dat <- read.csv(text="id,start,end
 1,1000,1300
 2,1200,1500
 3,1600,2600
 4,3000,4000
")

ir <- IRanges(dat$start,dat$end)
rir <- reduce(ir)
IRanges of length 3
    start  end width
[1]  1000 1500   501
[2]  1600 2600  1001
[3]  3000 4000  1001
+1

All Articles