DNA Testing in C / C ++

Question

DNA Testing in C / C ++

I repeat the DNA sequences, pulling fragments from 5-15 bases at a time into C ++ std :: string objects. Sometimes my string will contain a base without ATCG, and I want to take action when this happens. For example, I could see:

CTACGGTACGRCTA

Since there is an “R,” I want to acknowledge this case. I am familiar with regex, but people seem to recommend several different libraries. I have seen Boost, TR1 and others. Can someone suggest another way to catch my case or tell me which library should I use and why?

thanks

+6

c ++ c regex bioinformatics

nedblorf Apr 3 '11 at 17:48

source share

5 answers

Using C strspn() comes to mind.

 if (strspn(dnasequence, "ATCG") < strlen(dnasequence)) { /* bad character found */ }

+8

pmg Apr 3 '11 at 17:56

source share

Of course you can use regular expressions. But why not keep it simple?

 bool is_valid_base(char base) { switch (std::toupper(base)) { case 'A': case 'C': case 'G': case 'T': return true; default: return false; } } bool is_valid_dna(std::string sequence) { for (std::string::const_iterator i = sequence.begin(), end = sequence.end(); i != end; ++i) if (not is_valid_base(*i)) return false; return true; }

+5

Konrad Rudolph Apr 3 '11 at 17:56

source share

If you want to use regex to solve this problem, here is what one invalid char checks for:

 [^CGAT]

Or is it a regular expression to test the whole sequence:

 ^[CGAT]+$

Pretty simple.

Edit: Removed irrelevant materials.

+1

ridgerunner Apr 3 '11 at 17:56

source share

Is R a potential DNA pair ("letter")? If so, the ordering of base pairs is crucial in order to correctly display or accurately interpret the entire sequence as a whole.

In the codon. Determine where R is located? RAA, ARA, AAR, knowing this, is very important. Then process them by defining their attributes.

If this is simply undesirable information or data stored in memory may be stored. Scroll and delete.

0

Eric Lieber Aug 14 '17 at 13:11

source share

Oliver Charlesworth · Accepted Answer · 2011-04-03T17:56:13+0000

A regular expression for this is redundant. You can use std::string::find_first_not_of() .

DNA Testing in C / C ++

More articles: