DNA Testing in C / C ++

I repeat the DNA sequences, pulling fragments from 5-15 bases at a time into C ++ std :: string objects. Sometimes my string will contain a base without ATCG, and I want to take action when this happens. For example, I could see:

CTACGGTACGRCTA 

Since there is an “R,” I want to acknowledge this case. I am familiar with regex, but people seem to recommend several different libraries. I have seen Boost, TR1 and others. Can someone suggest another way to catch my case or tell me which library should I use and why?

thanks

+6
c ++ c regex bioinformatics
source share
5 answers

A regular expression for this is redundant. You can use std::string::find_first_not_of() .

+19
source share

Using C strspn() comes to mind.

 if (strspn(dnasequence, "ATCG") < strlen(dnasequence)) { /* bad character found */ } 
+8
source share

Of course you can use regular expressions. But why not keep it simple?

 bool is_valid_base(char base) { switch (std::toupper(base)) { case 'A': case 'C': case 'G': case 'T': return true; default: return false; } } bool is_valid_dna(std::string sequence) { for (std::string::const_iterator i = sequence.begin(), end = sequence.end(); i != end; ++i) if (not is_valid_base(*i)) return false; return true; } 
+5
source share

If you want to use regex to solve this problem, here is what one invalid char checks for:

 [^CGAT] 

Or is it a regular expression to test the whole sequence:

 ^[CGAT]+$ 

Pretty simple.

Edit: Removed irrelevant materials.

+1
source share

Is R a potential DNA pair ("letter")? If so, the ordering of base pairs is crucial in order to correctly display or accurately interpret the entire sequence as a whole.

In the codon. Determine where R is located? RAA, ARA, AAR, knowing this, is very important. Then process them by defining their attributes.

If this is simply undesirable information or data stored in memory may be stored. Scroll and delete.

0
source share

All Articles