It should be noted (for those who stumbled upon this, as I did) that there is a robust python library specifically designed to handle these tasks called Biopython . In a few lines of code, you can quickly get answers to all of the above questions. Here are some very simple examples, mostly adapted by reference. The tutorial also presents GC% graphs and sequence length tables.
In [1]: from Bio import SeqIO In [2]: allSeqs = [seq_record for seq_record in SeqIO.parse('/home/kevin/stack/ls_orchid.fasta', """fasta""")] In [3]: allSeqs[0] Out[3]: SeqRecord(seq=Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...CGC', SingleLetterAlphabet()), id='gi|2765658|emb|Z78533.1|CIZ78533', name='gi|2765658|emb|Z78533.1|CIZ78533', description='gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA', dbxrefs=[]) In [4]: len(allSeqs)
Kevin
source share