How to combine two Pandas data with different levels of column columns?

I want to combine two data frames with the same indexes, but with different column levels. One data block has a hierarchical index, and the other does not.

print df1 A_1 A_2 A_3 ..... Value_V Value_y Value_V Value_y Value_V Value_y instance200 50 0 6500 1 50 0 instance201 100 0 6400 1 50 0 

other:

 print df2 PV Estimate instance200 2002313 1231233 instance201 2134124 1124724 
Result

should look like this:

  PV Estimate A_1 A_2 A_3 ..... Value_V Value_y Value_V Value_y Value_V Value_y instance200 2002313 1231233 50 0 6500 1 50 0 instance201 2134124 1124724 100 0 6400 1 50 0 

but concatenating or concatenating in frames will give me df with a one-dimensional column index like this:

  PV Estimate (A_1,Value_V) (A_1,Value_y) (A_2,Value_V) (A_2,Value_y) ..... instance200 2002313 1231233 50 0 6500 1 instance201 2134124 1124724 100 0 6400 1 

How can I save a hierarchical index from df1?

+5
source share
2 answers

Maybe use a good ole assignment:

 df3 = df1.copy() df3[df2.columns] = df2 

gives

  A_1 A_2 A_3 PV Estimate Value_V Value_y Value_V Value_y Value_V Value_y instance200 50 0 6500 1 50 0 2002313 1231233 instance201 100 0 6400 1 50 0 2134124 1124724 
+3
source

This can be done if df2 has the same number of levels as df1:

 In [11]: df1 Out[11]: A_1 A_2 A_3 Value_V Value_y Value_V Value_y Value_V Value_y instance200 50 0 6500 1 50 0 instance201 100 0 6400 1 50 0 In [12]: df2 Out[12]: PV Estimate instance200 2002313 1231233 instance201 2134124 1124724 In [13]: df2.columns = pd.MultiIndex.from_arrays([df2.columns, [None] * len(df2.columns)]) In [14]: df2 Out[14]: PV Estimate NaN NaN instance200 2002313 1231233 instance201 2134124 1124724 

Now you can concat without distorting the column names:

 In [15]: pd.concat([df1, df2], axis=1) Out[15]: A_1 A_2 A_3 PV Estimate Value_V Value_y Value_V Value_y Value_V Value_y NaN NaN instance200 50 0 6500 1 50 0 2002313 1231233 instance201 100 0 6400 1 50 0 2134124 1124724 

Note: for df2 columns to use pd.concat first pd.concat([df2, df1], axis=1) .


However, I'm not sure I can think of a use case for this, storing them as separate DataFrames, it may actually be a simpler solution ...!

+2
source

Source: https://habr.com/ru/post/1214516/


All Articles