Make pandas DataFrame in dict and dropna

Question

Make pandas DataFrame in dict and dropna

I have some pandas DataFrame with NaN in it. Like this:

import pandas as pd import numpy as np raw_data={'A':{1:2,2:3,3:4},'B':{1:np.nan,2:44,3:np.nan}} data=pd.DataFrame(raw_data) >>> data AB 1 2 NaN 2 3 44 3 4 NaN

Now I want to expel from it and at the same time remove NaNs. The result should look like this:

 {'A': {1: 2, 2: 3, 3: 4}, 'B': {2: 44.0}}

But using the pandas to_dict function gives me this result:

 >>> data.to_dict() {'A': {1: 2, 2: 3, 3: 4}, 'B': {1: nan, 2: 44.0, 3: nan}}

So how to make a dict from a DataFrame and get rid of NaNs?

+13

python pandas

der_die_das_jojo 25 sept. '14 at 7:50

source share

4 answers

der_die_das_jojo · Answer 1 · 2014-09-25T07:50:34+0000

write the function contained to_dict from pandas

 import pandas as pd import numpy as np from pandas import compat def to_dict_dropna(self,data): return dict((k, v.dropna().to_dict()) for k, v in compat.iteritems(data)) raw_data={'A':{1:2,2:3,3:4},'B':{1:np.nan,2:44,3:np.nan}} data=pd.DataFrame(raw_data) dict=to_dict_dropna(data)

and as a result, you get what you want:

 >>> dict {'A': {1: 2, 2: 3, 3: 4}, 'B': {2: 44.0}}

Peter Mularien · Answer 2 · 2017-09-07T13:59:14+0000

There are many ways to achieve this, I spent some time evaluating performance on a not very large (70 thousand) framework. Although @der_die_das_jojo's answer is functional, it is also rather slow.

The answer proposed by this question actually turns out to be about 5 times faster on a large data frame.

On my test frame ( df ):

Above method:

 %time [ v.dropna().to_dict() for k,v in df.iterrows() ] CPU times: user 51.2 s, sys: 0 ns, total: 51.2 s Wall time: 50.9 s

Another slow method:

 %time df.apply(lambda x: [x.dropna()], axis=1).to_dict(orient='rows') CPU times: user 1min 8s, sys: 880 ms, total: 1min 8s Wall time: 1min 8s

The fastest method I could find:

 %time [ {k:v for k,v in m.items() if pd.notnull(v)} for m in df.to_dict(orient='rows')] CPU times: user 14.5 s, sys: 176 ms, total: 14.7 s Wall time: 14.7 s

The format of this output is a line-oriented dictionary. You may need to make adjustments if you want a column shape in the question.

It is very interesting if anyone finds an even quicker answer to this question.

Mclovvin · Answer 3 · 2018-04-12T11:37:42+0000

You can use dictation and bypass columns

 {col:df[col].dropna().to_dict() for col in df}

John haberstroh · Answer 4 · 2019-08-20T01:21:17+0000

I wrote a function to solve this problem without overriding to_dict and not calling it more than once. The approach is to recursively "leaf" out using the nan / None value.

 def trim_nan_leaf(tree): """For a tree of dict-like and list-like containers, prune None and NaN leaves. Particularly applicable for json-like dictionary objects """ # d may be a dictionary, iterable, or other (element) # * Do not recursively iterate if string # * element is the base case # * Only remove nan and None leaves def valid_leaf(leaf): if leaf is None: return(False) if isinstance(leaf, numbers.Number): if (not math.isnan(leaf)): return(leaf != -9223372036854775808) return(False) return(True) # Attempt dictionary try: return({k: trim_nan_leaf(tree[k]) for k in tree.keys() if valid_leaf(tree[k])}) except AttributeError: # Execute base case on string for simplicity... if isinstance(tree, str): return(tree) # Attempt iterator try: # Avoid infinite recursion for self-referential objects (like one-length strings!) if tree[0] == tree: return(tree) return([trim_nan_leaf(leaf) for leaf in tree if valid_leaf(leaf)]) # TypeError occurs when either [] or iterator are availble except TypeError: # Base Case return(tree)

Make pandas DataFrame in dict and dropna

More articles: