Are df.columns and df2.columns the same object?

I have a dataframe df2 , which is a copy of another data frame:

 In [5]: df = DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[7,8,9]}) In [6]: df Out[6]: ABC 0 1 4 7 1 2 5 8 2 3 6 9 In [7]: df2 = df.copy() 

and therefore not the same object:

 In [8]: df is df2 Out[8]: False In [9]: hex(id(df)) Out[9]: '0x89c6550L' In [10]: hex(id(df2)) Out[10]: '0x89c6a58L' 

My question is about the columns of these two data frames. Why are the column objects returned from df.columns and df2.columns the same object?

 In [11]: df.columns is df2.columns Out[11]: True In [12]: hex(id(df.columns)) Out[12]: '0x89bfb38L' In [13]: hex(id(df2.columns)) Out[13]: '0x89bfb38L' 

But if I make changes, do they become two separate objects?

 In [14]: df2.rename(columns={"B":"D"}, inplace=True) In [15]: df.columns Out[15]: Index([A, B, C], dtype=object) In [16]: df2.columns Out[16]: Index([A, D, C], dtype=object) In [17]: df.columns is df2.columns Out[17]: False In [18]: hex(id(df.columns)) Out[18]: '0x89bfb38L' In [19]: hex(id(df2.columns)) Out[19]: '0x89bfc88L' 

Can someone explain what is going on here? Why are df.columns and df2.columns two separate objects from the very beginning?

+4
source share
1 answer

df.columns is an Index object.

These are immutable objects (like the lines / ints are immutable. You can change the reference to one, but not to the actual object).

This allows you to share and, therefore, work efficiency (and you do not need to copy memory when copying the index). When you change one, you really get a new object (as opposed to a link to the original)

Almost all pandas operations return a new object to you, see here: http://pandas.pydata.org/pandas-docs/stable/basics.html#copying

therefore rename equivalent to copying and assigning to an index (columns and / or index, no matter what you change). BUT, this act of assignment creates a new index object. (therefore rename is just convenient for this operation)

+4
source

All Articles