According to the proposal, Chase bioconductor really fits, and in particular the Biostrings package. To install the latter, I would suggest installing the bioconductor base library as such:
source("http://bioconductor.org/biocLite.R") biocLite()
This way you cover all the dependencies. Now, to align 2 protein sequences or any two sequences in this case, you will need to use pairwiseAlignment from Biostrings . Given the fasta protseq.fasta file of two sequences, which looks like this:
>protein1 MYRALRLLARSRPLVRAPAAALASAPGLGGAAVPSFWPPNAAR MASQNSFRIEYDTFGELKVPNDKYYGAQTVRSTMNFKIGGVTE RMPTPVIKAFGILKRAAAEVNQDYGLDPKIANAIMKAADEVAE GKLNDHFPLVVWQTGSGTQTNMNVNEVISNRAIEMLGGELGSK IPVHPNDHVNKSQ >protein2 MRSRPAGPALLLLLLFLGAAESVRRAQPPRRYTPDWPSLDSRP LPAWFDEAKFGVFIHWGVFSVPAWGSEWFWWHWQGEGRPYQRF MRDNYPPGFSYADFGPQFTARFFHPEEWADLFQAAGAKYVVLT TKHHEGFTNW*
If you want to globally align these 2 sequences using, for example, BLOSUM100 as your replacement matrix, 0 is a fine for opening the gap and -5 for expanding one of them:
require("Biostrings") data(BLOSUM100) seqs <- readFASTA("~/Desktop/protseq.fasta", strip.descs=TRUE) alm <- pairwiseAlignment(seqs[[1]]$seq, seqs[[2]]$seq, substitutionMatrix=BLOSUM100, gapOpening=0, gapExtension=-5)
The result of this is (some alignment removed to save space):
> alm Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] MYRALRLLARSRPLVRA-PAAALAS.... subject: [1] MR-------SRP---AGPALLLLL.... score: -91
To extract an invoice for each alignment:
> score(alm) [1] -91
With this in mind, you can now easily perform all pairwise alignments using a very simple looping logic. To better change pair alignment with bioconductor , I suggest you read this .
An alternative approach would be to perform multiple sequence alignment rather than pairwise. You can use bio3d , and from there seqaln , to align all the sequences in your fasta file.