There are many ways to achieve this, I spent some time evaluating performance on a not very large (70 thousand) framework. Although @der_die_das_jojo's answer is functional, it is also rather slow.
The answer proposed by this question actually turns out to be about 5 times faster on a large data frame.
On my test frame ( df ):
Above method:
%time [ v.dropna().to_dict() for k,v in df.iterrows() ] CPU times: user 51.2 s, sys: 0 ns, total: 51.2 s Wall time: 50.9 s
Another slow method:
%time df.apply(lambda x: [x.dropna()], axis=1).to_dict(orient='rows') CPU times: user 1min 8s, sys: 880 ms, total: 1min 8s Wall time: 1min 8s
The fastest method I could find:
%time [ {k:v for k,v in m.items() if pd.notnull(v)} for m in df.to_dict(orient='rows')] CPU times: user 14.5 s, sys: 176 ms, total: 14.7 s Wall time: 14.7 s
The format of this output is a line-oriented dictionary. You may need to make adjustments if you want a column shape in the question.
It is very interesting if anyone finds an even quicker answer to this question.
Peter Mularien
source share