A way to test a set of genomic sites for exon / intron / utr?

I would like to check a bunch of form genomic places:

chr4:154723876-154724615
chr6:139580853-139581090
chr18:30440532-30441569

I want to see if they are in the UTR or intron or exon or intergenic sequence. I do not need information about in which genes introns (etc.) are these coordinates.

I assume that every known genetic element (exon, for example) has identified a genomic location (the starting position in the genome on each chromosome). I know that this is true for exons and introns, for example Ensembl has identifiers for each exon in the genome: see the example of exons and introns of the Amy1 gene in Mus musclulus . I want to query a database of such places with the above list of my locations, and if there is overlap between them (ideally I should be able to specify an overlap of, say, at least 10 bits, but if not, then I'm fine), I should get punch (yes, this area is in exon / intron /)

And the handicap is that I have several thousand of these locations and ideally I would like to request them at a time, and as a conclusion there will be a table in which each place will be assigned "intron / exon / utr / intergenic". The body is Mus musculus, and the sites are located throughout the genome.

I can’t provide a sample code of what I'm trying to do, because I don’t know where to start - if I had a package or something that could be built, this would help me find a solution.

It would be ideal if I could do this in R, but AFAIK I can't do this in biomaRt, and I couldn't find a package for this. I was thinking about the Galaxy, but given their non-trivial way of doing this and the strange way they produce, I prefer to stick with R. The Devil you know, etc.

Help would be greatly appreciated.

+4
4

, , , , , , , :

1) , , 3'-UTR 5'-UTR UCSC Ensembl. , , , . , , , 5 'UTR, 3' UTR.

2) BEDtools (Quinlan Hall 2010, https://www.ncbi.nlm.nih.gov/pubmed/20110278), : http://bedtools.readthedocs.org/en/latest/ intersect -f, ( bp %) UCSC.

- . , .

+1

NCBI

http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?TAXID=9606&CHR=4&MAPS=ideogr,morbid [11164,00% 3A11170.00] & QSTR = %20OR %20HD %20OR %20FGFR3 %20OR %20SNCA %20OR %20NRCLP %20OR %20FOP & QUERY = UID (1968,2105,2886,6280,13348,20241,9026199,9026201,9026283,9026440,9027752,9027884) & = 100

, , .

0

, , .

BSgenome.Mmusculus.UCSC.mm10 ( ) . (1 2) , . , bioconductor GenomicFeatures, UCSC.

, , . , , , , .

0

-, HOMER annotatePeaks.pl script. HOMER : annotatePeaks.pl.

your_bed_file genome > _output_file.

(, "genomic locations" file), , HOMER. "annotation", - "detailed annotation", , .

(, , 5'UTR, 3'UTR, , , GC-...)

, , Bedtools, , HOMER, . , , R biomaRt, , HOMER, TSS, 5'UTR, , , r "", .

0

All Articles