BioPython: how to convert the amino acid alphabet to

When discussing how to import sequence data using Bio.SeqIO.parse (), the BioPython cookbook states that:

An optional argument exists to indicate the alphabet to be used. This is useful for file formats such as FASTA, where otherwise Bio.SeqIO will have a common alphabet by default.

How to add this optional argument? I have the following code:

from os.path import abspath
from Bio import SeqIO

handle = open(f_path, "rU")
records = list(SeqIO.parse(handle, "fasta"))
handle.close()

This imports a large list of FASTA files from the UniProt database. The problem is that it is in the single class SingleLetterAlphabet. How to convert between SingleLetterAlphabet to ExtendedIUPACProtein?

The ultimate goal is to find these sequences for a motive such as GxxxG.

+4
source share
1 answer

Like this:

# Import required alphabet
from Bio.Alphabet import IUPAC

# Pass imported alphabet as an argument for `SeqIO.parse`:
records = list(SeqIO.parse(handle, 'fasta', IUPAC.extended_protein))
+7
source

All Articles