Another option is to add it as an additional column index level to make it MultiIndex:
In [11]: df = pd.DataFrame(randn(2, 2), columns=['A', 'B']) In [12]: df Out[12]: AB 0 -0.952928 -0.624646 1 -1.020950 -0.883333 In [13]: df.columns = pd.MultiIndex.from_tuples(zip(['AA', 'BB'], df.columns)) In [14]: df Out[14]: AA BB AB 0 -0.952928 -0.624646 1 -1.020950 -0.883333
This allows you to save the correct data types for the DataFrame, so you can perform fast and correct calculations in your DataFrame and allow you to access both old and new column names.
.
For completeness, here is DSM (remote response), making the columns a row, which, as already mentioned, is usually not a good idea:
In [21]: df_bad_idea = df.T.reset_index().T In [22]: df_bad_idea Out[22]: 0 1 index AB 0 -0.952928 -0.624646 1 -1.02095 -0.883333
Note that dtype can change (if it's the column names and not the correct values), as in this case ... so be careful if you are actually planning to do any work on this, as it probably will be slower and may not even work
In [23]: df.sum() Out[23]: A -1.973878 B -1.507979 dtype: float64 In [24]: df_bad_idea.sum() # doh! Out[24]: Series([], dtype: float64)
If the column names is a line that was erroneous as a header line, then you should fix this when reading in the data (e.g. read_csv use header=None ).
Andy hayden
source share