When discussing how to import sequence data using Bio.SeqIO.parse (), the BioPython cookbook states that:
An optional argument exists to indicate the alphabet to be used. This is useful for file formats such as FASTA, where otherwise Bio.SeqIO will have a common alphabet by default.
How to add this optional argument? I have the following code:
from os.path import abspath
from Bio import SeqIO
handle = open(f_path, "rU")
records = list(SeqIO.parse(handle, "fasta"))
handle.close()
This imports a large list of FASTA files from the UniProt database. The problem is that it is in the single class SingleLetterAlphabet. How to convert between SingleLetterAlphabet to ExtendedIUPACProtein?
The ultimate goal is to find these sequences for a motive such as GxxxG.
Kevin source
share