Make pandas DataFrame in dict and dropna

I have some pandas DataFrame with NaN in it. Like this:

import pandas as pd import numpy as np raw_data={'A':{1:2,2:3,3:4},'B':{1:np.nan,2:44,3:np.nan}} data=pd.DataFrame(raw_data) >>> data AB 1 2 NaN 2 3 44 3 4 NaN 

Now I want to expel from it and at the same time remove NaNs. The result should look like this:

 {'A': {1: 2, 2: 3, 3: 4}, 'B': {2: 44.0}} 

But using the pandas to_dict function gives me this result:

 >>> data.to_dict() {'A': {1: 2, 2: 3, 3: 4}, 'B': {1: nan, 2: 44.0, 3: nan}} 

So how to make a dict from a DataFrame and get rid of NaNs?

+13
python pandas
source share
4 answers

write the function contained to_dict from pandas

 import pandas as pd import numpy as np from pandas import compat def to_dict_dropna(self,data): return dict((k, v.dropna().to_dict()) for k, v in compat.iteritems(data)) raw_data={'A':{1:2,2:3,3:4},'B':{1:np.nan,2:44,3:np.nan}} data=pd.DataFrame(raw_data) dict=to_dict_dropna(data) 

and as a result, you get what you want:

 >>> dict {'A': {1: 2, 2: 3, 3: 4}, 'B': {2: 44.0}} 
+6
source share

There are many ways to achieve this, I spent some time evaluating performance on a not very large (70 thousand) framework. Although @der_die_das_jojo's answer is functional, it is also rather slow.

The answer proposed by this question actually turns out to be about 5 times faster on a large data frame.

On my test frame ( df ):

Above method:

 %time [ v.dropna().to_dict() for k,v in df.iterrows() ] CPU times: user 51.2 s, sys: 0 ns, total: 51.2 s Wall time: 50.9 s 

Another slow method:

 %time df.apply(lambda x: [x.dropna()], axis=1).to_dict(orient='rows') CPU times: user 1min 8s, sys: 880 ms, total: 1min 8s Wall time: 1min 8s 

The fastest method I could find:

 %time [ {k:v for k,v in m.items() if pd.notnull(v)} for m in df.to_dict(orient='rows')] CPU times: user 14.5 s, sys: 176 ms, total: 14.7 s Wall time: 14.7 s 

The format of this output is a line-oriented dictionary. You may need to make adjustments if you want a column shape in the question.

It is very interesting if anyone finds an even quicker answer to this question.

+6
source share

You can use dictation and bypass columns

 {col:df[col].dropna().to_dict() for col in df} 
0
source share

I wrote a function to solve this problem without overriding to_dict and not calling it more than once. The approach is to recursively "leaf" out using the nan / None value.

 def trim_nan_leaf(tree): """For a tree of dict-like and list-like containers, prune None and NaN leaves. Particularly applicable for json-like dictionary objects """ # d may be a dictionary, iterable, or other (element) # * Do not recursively iterate if string # * element is the base case # * Only remove nan and None leaves def valid_leaf(leaf): if leaf is None: return(False) if isinstance(leaf, numbers.Number): if (not math.isnan(leaf)): return(leaf != -9223372036854775808) return(False) return(True) # Attempt dictionary try: return({k: trim_nan_leaf(tree[k]) for k in tree.keys() if valid_leaf(tree[k])}) except AttributeError: # Execute base case on string for simplicity... if isinstance(tree, str): return(tree) # Attempt iterator try: # Avoid infinite recursion for self-referential objects (like one-length strings!) if tree[0] == tree: return(tree) return([trim_nan_leaf(leaf) for leaf in tree if valid_leaf(leaf)]) # TypeError occurs when either [] or iterator are availble except TypeError: # Base Case return(tree) 
0
source share

All Articles