Objective:group a large pool of short DNA fragments into classes that share common subsequence patterns and find the consensus sequence of each class.
[gcta]{5}[gc]{8,}[gcta]{5}
Plan:to perform multiple alignment (i.e. with ClustalW2) to search for classes that have common sequences in area 2 and their consensus sequences.
Questions:
,
, 300 FAR TOO FEW , , 8-. 65 536 8- 3 000 000 000 ( , , ). G/C, 3 000 000 000/65 536 * 2 ^ 8 = ~ 12 000 000 (, , , CpG ). 300?
. 1, CG GC , -G--C. , ( ). .
Clustal - , . GC, :
8-mer , . .
, , , (, ).