Nested Dictionary

Question

Nested Dictionary

I am working on some FASTA-like sequences (not FASTA, but something that I have determined is similar for some rejected PDBs from the PISCES server).

I have a question. I have a small number of sequences called nCatSeq for which there is a MULTIPLE nBasinSeq . I am browsing a large PDB file and I want to extract for each nCatSeq corresponding nBasinSeq without redundancy in the dictionary. Below is a snippet of code that does this.

 nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3] nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3] if nCatSeq not in potBasin: potBasin[nCatSeq]=nBasinSeq else: if nBasinSeq not in potBasin[nCatSeq]: potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq else: pass

I get the following as an answer for one nCatSeq,

 '4241': ((('VUVV', 'DDRV'), 'DDVG'), 'VUVV')

I want:

'4241': ('VUVV', 'DDRV', 'DDVG', 'VUVV')

I don't need all the extra brackets due to the following command

 potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq

(see code snippet above)

Is there any way to do this?

+7

python dictionary

user1729355 Oct 08 '12 at 15:57

source share

3 answers

The problem is that the comma “adds” the element just creates a new tuple every time. For this you use lists and append :

 nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3] nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3] if nCatSeq not in potBasin: potBasin[nCatSeq]=[nBasinSeq] elif nBasinSeq not in potBasin[nCatSeq]: potBasin[nCatSeq].append(nBasinSeq)

Instead of making the potbass a normal dictionary, it would be better to replace it with defaultdict . The code can then be simplified to:

 # init stuff from collections import defaultdict potBasin = defaultdict(list) # inside loop nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3] nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3] potBasin[nCatSeq].append(nBasinSeq)

+5

Rob wouters Oct 08 '12 at 16:07

source share

Your question comes down to aligning the nested list and eliminating redundant entries:

 def flatten(nested, answer=None): if answer is None: answer = [] if nested == []: return answer else: n = nested[0] if is instance(n, tuple): return flatten(nested[1:], nested(n[0], answer)) else: return flatten(nested[1:], answer+n[0])

So, with your nested dictionary:

 for k in nested_dict: nested_dict[k] = tuple(flatten(nested_dict[k]))

if you want to delete duplicate entries:

 for k in nested_dict: nested_dict[k] = tuple(set(flatten(nested_dict[k])))

Hope this helps

0

inspectorG4dget Oct 08 '12 at 16:09

source share

Andy hayden · Accepted Answer · 2012-10-08T16:08:59+0000

You can add them as tuples:

 if nCatSeq not in potBasin: potBasin[nCatSeq] = (nBasinSeq,) else: if nBasinSeq not in potBasin[nCatSeq]: potBasin[nCatSeq] = potBasin[nCatSeq] + (nBasinSeq,)

Thus, not:

 (('VUVV', 'DDRV'), 'DDVG') # you will get ('VUVV', 'DDRV', 'DDVG') # == ('VUVV', 'DDRV')+ ('DDVG',)

Nested Dictionary

More articles: