Pandas DataFrame contains NaN after write operation

Question

Pandas DataFrame contains NaN after write operation

Here is a minimal working example of my problem:

import pandas as pd columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)]) a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float') b = pd.Series([13.0, 15.0]) a.loc[1,'b'] = b # this line results in NaNs a.loc[1,'b'] = b.values # this yields correct behavior

Why is the first assignment wrong? Both series have the same index, so I assume that it should lead to the correct result.

I am using pandas 0.17.0.

+7

python pandas

Mindind0rtex Nov 11 '15 at 10:30

source share

2 answers

You can directly assign b column in a , because b not a multi-index series. Changing b will make it work:

 columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)]) a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float') index = pd.MultiIndex.from_product([['b'], range(2)]) b = pd.Series([13.0, 15.0], index=index) a.loc[1,'b'] = b print(a)

gives

  abc 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 13 15 0 0 2 0 0 0 0 0 0

Another case when you use b.values probably works, because Pandas takes the values in b at face value and tries to perform the most logical assignment for the supplied values.

+1

Evert Nov 11 '15 at 23:17

source share

unutbu · Accepted Answer · 2015-11-11T23:11:41+0000

When you write

 a.loc[1,'b'] = b

and b is a series, index b must exactly match the index generated by a.loc[1,'b'] so that the values in b are copied to a . It turns out, however, that if a.columns is MultiIndex , the index for a.loc[1,'b'] :

 (Pdb) p new_ix Index([(u'b', 0), (u'b', 1)], dtype='object')

whereas the index for b is

 (Pdb) p ser.index Int64Index([0, 1], dtype='int64')

They do not match, and therefore

 (Pdb) p ser.index.equals(new_ix) False

Since the values are not aligned, the code branch you fall into assigns

 (Pdb) p ser.reindex(new_ix).values array([ nan, nan])

I found this by adding pdb.set_trace() to your code:

 import pandas as pd columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)]) a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float') b = pd.Series([13.0, 15.0]) import pdb pdb.set_trace() a.loc[1,'b'] = b # this line results in NaNs a.loc[1,'b'] = b.values # this yields correct behavior

and just stepping it to the "high level" and finding a problem arises in

  if isinstance(value, ABCSeries): value = self._align_series(indexer, value)

and then again going over it (with a finer jagged comb) with a breakpoint starting at the line calling self._align_series(indexer, value) .

Please note that if you change the index b as well as MultiIndex:

 b = pd.Series([13.0, 15.0], index=pd.MultiIndex.from_product([['b'], [0,1]]))

then

 import pandas as pd columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)]) a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float') b = pd.Series([13.0, 15.0], index=pd.MultiIndex.from_product([['b'], [0,1]])) a.loc[1,'b'] = b print(a)

gives

  abc 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 13 15 0 0 2 0 0 0 0 0 0

Pandas DataFrame contains NaN after write operation

More articles: