I would like to check a bunch of form genomic places:
chr4:154723876-154724615
chr6:139580853-139581090
chr18:30440532-30441569
I want to see if they are in the UTR or intron or exon or intergenic sequence. I do not need information about in which genes introns (etc.) are these coordinates.
I assume that every known genetic element (exon, for example) has identified a genomic location (the starting position in the genome on each chromosome). I know that this is true for exons and introns, for example Ensembl has identifiers for each exon in the genome: see the example of exons and introns of the Amy1 gene in Mus musclulus . I want to query a database of such places with the above list of my locations, and if there is overlap between them (ideally I should be able to specify an overlap of, say, at least 10 bits, but if not, then I'm fine), I should get punch (yes, this area is in exon / intron /)
And the handicap is that I have several thousand of these locations and ideally I would like to request them at a time, and as a conclusion there will be a table in which each place will be assigned "intron / exon / utr / intergenic". The body is Mus musculus, and the sites are located throughout the genome.
I can’t provide a sample code of what I'm trying to do, because I don’t know where to start - if I had a package or something that could be built, this would help me find a solution.
It would be ideal if I could do this in R, but AFAIK I can't do this in biomaRt, and I couldn't find a package for this. I was thinking about the Galaxy, but given their non-trivial way of doing this and the strange way they produce, I prefer to stick with R. The Devil you know, etc.
Help would be greatly appreciated.