FASTA Algorithm Explanation

Question

FASTA Algorithm Explanation

I am trying to understand the basic steps of the FASTA algorithm when looking for similar sequences of query sequences in a database. These are the steps of the algorithm:

Define common k-words between i and j
Type diagonals matching k-words, identify the top 10 diagonals
Restore original regions with a replacement matrix.
Join source regions using spaces, punish spaces
Perform dynamic programming to find the final alignments.

I am confused with the 3rd and 4th steps in using the PAM250 score matrix and how to “join the use of spaces”.

Can someone explain to me these two steps “as specific as possible”. Thanks

+7

bioinformatics fasta

conmadoi Dec 03 '11 at 8:47

source share

2 answers

reve_etrange · Answer 1 · 2011-12-03T09:57:54+0000

Here's how FASTA works:

Find all the k-length identities, then find locally similar domains by choosing those dense k-words with identities (i.e. many k-words without a large number of spaces between them). The top ten source areas are used.
The source regions are re-evaluated in length using the replacement matrix in the usual way. The optimal calculations are determined.
Create alignment of cropped source areas using dynamic programming with a minimum penalty of 20. Areas with too low a score are not included.
Optimize alignment with 3) using dynamic programming with a strip (Smith-Waterman). This is dynamic programming limited to a 32-band strip around the original alignment, which saves space and time over full dynamic programming.

If the initial areas are not enough to create alignment in 3), the best result from 2) can be used to rank sequences by similarity. The estimates from 3) and 4) can also be used for this purpose.

Unfortunately, my institution does not have access to the FASTA source paper, so I cannot provide the initial values of the various parameters mentioned above.

Bill pearson · Answer 2 · 2012-03-02T13:36:45+0000

The explanation, in essence, is correct, but the final optimization of the strip focuses on one of the best disjoint alignment found in step 2. Stage 3 is used simply to improve the sensitivity when choosing the sequences that get step 4.

The original paper can be seen here: http://faculty.virginia.edu/wrpearson/papers/pearson_lipman_pnas88.pdf

FASTA Algorithm Explanation

More articles: