Nested Dictionary

I am working on some FASTA-like sequences (not FASTA, but something that I have determined is similar for some rejected PDBs from the PISCES server).

I have a question. I have a small number of sequences called nCatSeq for which there is a MULTIPLE nBasinSeq . I am browsing a large PDB file and I want to extract for each nCatSeq corresponding nBasinSeq without redundancy in the dictionary. Below is a snippet of code that does this.

 nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3] nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3] if nCatSeq not in potBasin: potBasin[nCatSeq]=nBasinSeq else: if nBasinSeq not in potBasin[nCatSeq]: potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq else: pass 

I get the following as an answer for one nCatSeq,

 '4241': ((('VUVV', 'DDRV'), 'DDVG'), 'VUVV') 

I want:

'4241': ('VUVV', 'DDRV', 'DDVG', 'VUVV')

I don't need all the extra brackets due to the following command

 potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq 

(see code snippet above)

Is there any way to do this?

+7
source share
3 answers

You can add them as tuples:

 if nCatSeq not in potBasin: potBasin[nCatSeq] = (nBasinSeq,) else: if nBasinSeq not in potBasin[nCatSeq]: potBasin[nCatSeq] = potBasin[nCatSeq] + (nBasinSeq,) 

Thus, not:

 (('VUVV', 'DDRV'), 'DDVG') # you will get ('VUVV', 'DDRV', 'DDVG') # == ('VUVV', 'DDRV')+ ('DDVG',) 
+1
source

The problem is that the comma β€œadds” the element just creates a new tuple every time. For this you use lists and append :

 nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3] nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3] if nCatSeq not in potBasin: potBasin[nCatSeq]=[nBasinSeq] elif nBasinSeq not in potBasin[nCatSeq]: potBasin[nCatSeq].append(nBasinSeq) 

Instead of making the potbass a normal dictionary, it would be better to replace it with defaultdict . The code can then be simplified to:

 # init stuff from collections import defaultdict potBasin = defaultdict(list) # inside loop nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3] nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3] potBasin[nCatSeq].append(nBasinSeq) 
+5
source

Your question comes down to aligning the nested list and eliminating redundant entries:

 def flatten(nested, answer=None): if answer is None: answer = [] if nested == []: return answer else: n = nested[0] if is instance(n, tuple): return flatten(nested[1:], nested(n[0], answer)) else: return flatten(nested[1:], answer+n[0]) 

So, with your nested dictionary:

 for k in nested_dict: nested_dict[k] = tuple(flatten(nested_dict[k])) 

if you want to delete duplicate entries:

 for k in nested_dict: nested_dict[k] = tuple(set(flatten(nested_dict[k]))) 

Hope this helps

0
source

All Articles