Pandas multiindex assignment from another data frame

I am trying to understand pandas MultiIndex DataFrame and how to assign data to them. In particular, I am interested in assigning whole blocks that correspond to another smaller data frame.

 ix = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b', 'c', 'd']]) df = pd.DataFrame(index=ix, columns=['1st', '2nd', '3rd'], dtype=np.float64) df_ = pd.DataFrame(index=['a', 'b', 'c', 'd'], columns=['1st', '2nd', '3rd'], data=np.random.rand(4, 3)) df_ 1st 2nd 3rd a 0.730251 0.468134 0.876926 b 0.104990 0.082461 0.129083 c 0.993608 0.117799 0.341811 d 0.784950 0.840145 0.016777 

df the same, except that all NaN values ​​are two blocks A and B Now, if I want to assign values ​​from df_ to df , I would suggest that I can do something like

 df.loc['A',:] = df_ # Runs, does not work df.loc[('A','a'):('A','d')] = df_ # AssertionError (??) 'Start slice bound is non-scalar' df.loc[('A','a'):('A','d')] # No AssertionError (??) idx = pd.IndexSlice df.loc[idx['A', :]] = df_ # Runs, does not work 

None of these functions leaves all values ​​in df as NaN , although df.loc[idx['A', :]] gives me a fragment of a data frame that exactly matches that of the subframe ( df_ ). So is this a case of setting values ​​in a view? Explicit index iteration in df_ works

 # this is fine for v in df_.index: df.loc[idx['A', v]] = df_.loc[v] # this is also fine for v in df_.index: df.loc['A', v] = df_.loc[v] 

Is it possible to assign whole blocks (sort of like NumPy )? If not, that's fine, I'm just trying to understand how the system works.

There is a related question about index slicers, but about assigning one value to the hidden part of the DataFrame , and not about assigning blocks. Pandas: the right way to set values ​​based on a condition for a subset of a multi-index data

+8
python variable-assignment pandas multi-index
source share
2 answers

When you use

 df.loc['A', :] = df_ 

Pandas is trying to align the df_ index with the sub-DataFrame df index. However, at the point in the code where the alignment is performed, the sub-DataFrame has MultiIndex , and not the only index that you see as a result of df.loc['A', :] .

Thus, alignment is not performed because df_ has a single index, not MultiIndex, which is needed. To make sure the df_ index df_ indeed a problem, please note that

 ix_ = pd.MultiIndex.from_product([['A'], ['a', 'b', 'c', 'd']]) df_.index = ix_ df.loc['A', :] = df_ print(df) 

succeeds yielding to something like

 A a 0.229970 0.730824 0.784356 b 0.584390 0.628337 0.318222 c 0.257192 0.624273 0.221279 d 0.787023 0.056342 0.240735 B a NaN NaN NaN b NaN NaN NaN c NaN NaN NaN d NaN NaN NaN 

Of course, you probably don't want to create a new MultiIndex every time you want to assign a block of values. Instead, to get around this alignment problem, you can use the NumPy array as the destination value:

 df.loc['A', :] = df_.values 

Since df_.values is a NumPy array, and the array has no index, no alignment is performed and the assignment gives the same result as above. This trick of using NumPy arrays when you don't want to align indexes is applied in many situations when using Pandas.

Note also that array-by-NumPy-array can also help you perform more complex assignments, for example, for strings that are not adjacent:

 idx = pd.IndexSlice df.loc[idx[:,('a','b')], :] = df_.values 

gives

 In [85]: df Out[85]: 1st 2nd 3rd A a 0.229970 0.730824 0.784356 b 0.584390 0.628337 0.318222 c NaN NaN NaN d NaN NaN NaN B a 0.257192 0.624273 0.221279 b 0.787023 0.056342 0.240735 c NaN NaN NaN d NaN NaN NaN 

eg.

+12
source share

I did 8480 a back, which does the assignment of subframes using columns. so you can do as a workaround:

 >>> rf 1st 2nd 3rd a 0.730 0.468 0.877 b 0.105 0.082 0.129 c 0.994 0.118 0.342 d 0.785 0.840 0.017 >>> df.T['A'] = rf.T # take transpose of both sides >>> df 1st 2nd 3rd A a 0.730 0.468 0.877 b 0.105 0.082 0.129 c 0.994 0.118 0.342 d 0.785 0.840 0.017 B a NaN NaN NaN b NaN NaN NaN c NaN NaN NaN d NaN NaN NaN 

which said you can post this as a bug on github.

edit : it seems that adding a dummy fragment at the end also works:

 >>> df.loc['A'][:] = rf >>> df 1st 2nd 3rd A a 0.730 0.468 0.877 b 0.105 0.082 0.129 c 0.994 0.118 0.342 d 0.785 0.840 0.017 B a NaN NaN NaN b NaN NaN NaN c NaN NaN NaN d NaN NaN NaN 
0
source share

All Articles