How to combine two Pandas data with different levels of column columns?

Question

How to combine two Pandas data with different levels of column columns?

I want to combine two data frames with the same indexes, but with different column levels. One data block has a hierarchical index, and the other does not.

print df1 A_1 A_2 A_3 ..... Value_V Value_y Value_V Value_y Value_V Value_y instance200 50 0 6500 1 50 0 instance201 100 0 6400 1 50 0

other:

 print df2 PV Estimate instance200 2002313 1231233 instance201 2134124 1124724

Result

should look like this:

  PV Estimate A_1 A_2 A_3 ..... Value_V Value_y Value_V Value_y Value_V Value_y instance200 2002313 1231233 50 0 6500 1 50 0 instance201 2134124 1124724 100 0 6400 1 50 0

but concatenating or concatenating in frames will give me df with a one-dimensional column index like this:

  PV Estimate (A_1,Value_V) (A_1,Value_y) (A_2,Value_V) (A_2,Value_y) ..... instance200 2002313 1231233 50 0 6500 1 instance201 2134124 1124724 100 0 6400 1

How can I save a hierarchical index from df1?

+5

python pandas

Pat patterson Mar 03 '15 at 1:06

source share

2 answers

unutbu · Answer 1 · 2015-03-03T02:05:53+0000

Maybe use a good ole assignment:

 df3 = df1.copy() df3[df2.columns] = df2

gives

  A_1 A_2 A_3 PV Estimate Value_V Value_y Value_V Value_y Value_V Value_y instance200 50 0 6500 1 50 0 2002313 1231233 instance201 100 0 6400 1 50 0 2134124 1124724

Andy hayden · Answer 2 · 2015-03-03T01:50:31+0000

This can be done if df2 has the same number of levels as df1:

 In [11]: df1 Out[11]: A_1 A_2 A_3 Value_V Value_y Value_V Value_y Value_V Value_y instance200 50 0 6500 1 50 0 instance201 100 0 6400 1 50 0 In [12]: df2 Out[12]: PV Estimate instance200 2002313 1231233 instance201 2134124 1124724 In [13]: df2.columns = pd.MultiIndex.from_arrays([df2.columns, [None] * len(df2.columns)]) In [14]: df2 Out[14]: PV Estimate NaN NaN instance200 2002313 1231233 instance201 2134124 1124724

Now you can concat without distorting the column names:

 In [15]: pd.concat([df1, df2], axis=1) Out[15]: A_1 A_2 A_3 PV Estimate Value_V Value_y Value_V Value_y Value_V Value_y NaN NaN instance200 50 0 6500 1 50 0 2002313 1231233 instance201 100 0 6400 1 50 0 2134124 1124724

Note: for df2 columns to use pd.concat first pd.concat([df2, df1], axis=1) .

However, I'm not sure I can think of a use case for this, storing them as separate DataFrames, it may actually be a simpler solution ...!

How to combine two Pandas data with different levels of column columns?

More articles: