Pandas DataFrame contains NaN after write operation

Here is a minimal working example of my problem:

import pandas as pd columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)]) a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float') b = pd.Series([13.0, 15.0]) a.loc[1,'b'] = b # this line results in NaNs a.loc[1,'b'] = b.values # this yields correct behavior 

Why is the first assignment wrong? Both series have the same index, so I assume that it should lead to the correct result.

I am using pandas 0.17.0.

+7
python pandas
source share
2 answers

When you write

 a.loc[1,'b'] = b 

and b is a series, index b must exactly match the index generated by a.loc[1,'b'] so that the values ​​in b are copied to a . It turns out, however, that if a.columns is MultiIndex , the index for a.loc[1,'b'] :

 (Pdb) p new_ix Index([(u'b', 0), (u'b', 1)], dtype='object') 

whereas the index for b is

 (Pdb) p ser.index Int64Index([0, 1], dtype='int64') 

They do not match, and therefore

 (Pdb) p ser.index.equals(new_ix) False 

Since the values ​​are not aligned, the code branch you fall into assigns

 (Pdb) p ser.reindex(new_ix).values array([ nan, nan]) 

I found this by adding pdb.set_trace() to your code:

 import pandas as pd columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)]) a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float') b = pd.Series([13.0, 15.0]) import pdb pdb.set_trace() a.loc[1,'b'] = b # this line results in NaNs a.loc[1,'b'] = b.values # this yields correct behavior 

and just stepping it to the "high level" and finding a problem arises in

  if isinstance(value, ABCSeries): value = self._align_series(indexer, value) 

and then again going over it (with a finer jagged comb) with a breakpoint starting at the line calling self._align_series(indexer, value) .


Please note that if you change the index b as well as MultiIndex:

 b = pd.Series([13.0, 15.0], index=pd.MultiIndex.from_product([['b'], [0,1]])) 

then

 import pandas as pd columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)]) a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float') b = pd.Series([13.0, 15.0], index=pd.MultiIndex.from_product([['b'], [0,1]])) a.loc[1,'b'] = b print(a) 

gives

  abc 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 13 15 0 0 2 0 0 0 0 0 0 
+4
source share

You can directly assign b column in a , because b not a multi-index series. Changing b will make it work:

 columns = pd.MultiIndex.from_product([['a', 'b', 'c'], range(2)]) a = pd.DataFrame(0.0, index=range(3),columns=columns, dtype='float') index = pd.MultiIndex.from_product([['b'], range(2)]) b = pd.Series([13.0, 15.0], index=index) a.loc[1,'b'] = b print(a) 

gives

  abc 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 13 15 0 0 2 0 0 0 0 0 0 

Another case when you use b.values probably works, because Pandas takes the values ​​in b at face value and tries to perform the most logical assignment for the supplied values.

+1
source share

All Articles