I have long text (about 5 MB in size), and another text is called a template (about 2000 characters).
The challenge is to find the corresponding parts from the genome pattern, the length of which is 15 characters or more.
Example:
long text: ACGTACGTGTCA AAAACCCCGGGGTTTTA GTACCCGTAGGCGTAT AND MUCH LONG TERM
template: ACGGTATTGAC AAAACCCCGGGGTTTTA TGTTCCCAG
I am looking for an efficient (and easy to understand and implementable) algorithm.
A bonus would be a way to implement this with only char -arrays in C ++, if at all possible.
c ++ c string algorithm
Hedge
source share