Here's how FASTA works:
- Find all the k-length identities, then find locally similar domains by choosing those dense k-words with identities (i.e. many k-words without a large number of spaces between them). The top ten source areas are used.
- The source regions are re-evaluated in length using the replacement matrix in the usual way. The optimal calculations are determined.
- Create alignment of cropped source areas using dynamic programming with a minimum penalty of 20. Areas with too low a score are not included.
- Optimize alignment with 3) using dynamic programming with a strip (Smith-Waterman). This is dynamic programming limited to a 32-band strip around the original alignment, which saves space and time over full dynamic programming.
If the initial areas are not enough to create alignment in 3), the best result from 2) can be used to rank sequences by similarity. The estimates from 3) and 4) can also be used for this purpose.
Unfortunately, my institution does not have access to the FASTA source paper, so I cannot provide the initial values of the various parameters mentioned above.
reve_etrange
source share