Pandas dataframe to sparse dictionary dictionaries

How to convert pandas dataFrame to sparse dictionary dictionaries, where only indexes of some clipping are shown. In the toy example below, I need indexes for each column whose values ​​are> 0

import pandas as pd table1 = [['gene_a', -1 , 1], ['gene_b', 1, 1],['gene_c', 0, -1]] df1 = pd.DataFrame(table) df1.columns = ['gene','cell_1', 'cell_2'] df1 = df1.set_index('gene') dfasdict = df1.to_dict(orient='dict') 

This gives:

dfasdict = {'cell_1': {'gene_a': -1, 'gene_b': 0, 'gene_c': 0}, 'cell_2': {'gene_a': 1, 'gene_b': -1, 'gene_c': -1}}

But the desired result is a sparse dictionary, where only values ​​less than zero are shown:

desired = {'cell_1': {'gene_a': -1}, 'cell_2': {'gene_b': -1, 'gene_c': -1}}

I can do some processing to change the dfasdict dictionary after creation, but I want to do the conversion in the same step, since the processing subsequently involves iterating over very large dictionaries. Is it possible to do this in pandas?

+6
source share
2 answers

This result uses dictionary understanding to generate the result. For each column in cell_1 and cell_2 it finds those that are less than zero ( lt ) and converts the result into a dictionary.

 >>> {col: df1.loc[df1[col].lt(0), col].to_dict() for col in ['cell_1', 'cell_2']} {'cell_1': {'gene_a': -1}, 'cell_2': {'gene_c': -1}} 

To understand what is going on here:

 >>> df1.loc['cell_1'].lt(0) gene gene_a True gene_b False gene_c False Name: cell_1, dtype: bool >>> df1.loc[df1['cell_1'].lt(0), 'cell_1'].to_dict() {'gene_a': -1} 
+2
source

Delete the last line of your code and add it.

 from pandas import compat def to_dict_custom(data): return dict((k, v[v<0].to_dict()) for k, v in compat.iteritems(data)) dfasdict = to_dict_custom(df1) print dfasdict 

what gives,

 {'cell_2': {'gene_c': -1.0}, 'cell_1': {'gene_a': -1.0}} 

line 3 & 4, inspired here , please check.

+1
source

All Articles