Pandas index index index on merge in Python?

I am merging two data frames with merge(..., how='left') , since I want to save only records that match the "left" data frame. The problem is that the merge operation seems to lower the index of my left data frame shown here:

 import pandas df1 = pandas.DataFrame([{"id": 1, "name": "bob"}, {"id": 10, "name": "sally"}]) df1 = df1.set_index("id") df2 = pandas.DataFrame([{"name": "bob", "age": 10}, {"name": "sally", "age": 11}]) print "df1 premerge: " print df1 df1 = df1.merge(df2, on=["name"], how="left") print "merged: " print df1 # This is not "id" print df1.index # And there no "id" field assert ("id" in df1.columns) == False 

Before the merge, df1 id was indexed. After the merge operation, only the default zero index for the merged data frame and id field were deleted. How can I perform such a merge operation, but save the index of the leftmost data frame?

To clarify: I want all df2 columns to df2 added to each record in df1 that has a corresponding id value. If the entry in df2 has an id value other than df1 , then this should not be combined (hence how='left' ).

edit . I could like to hack: df1.reset_index() , but merge and then set the index again, but I prefer it to not, if possible, look like the merge should not drop the index. thanks.

+6
source share
1 answer

You already indicated reset_index before merging and then using set_index, which works. The only way to preserve indexes in a merge is to merge to include an index from at least one of the merged data frames. So you can do:

 In [403]: df2 = df2.set_index('name') In [404]: df1.merge(df2, left_on='name', right_index=True) Out[404]: name age id 1 bob 10 10 sally 11 

to combine the index df2, which we took from its column "name", in the column "name" on df1.

This makes sense, because otherwise the index of the resulting data frame is ambiguous, since it can come from any data block.

+3
source

All Articles