Equivalent to set_index for column headers

In Pandas, if I have a DataFrame that looks like this:

0 1 2 3 4 5 6 0 2013 2012 2011 2010 2009 2008 1 January 3,925 3,463 3,289 3,184 3,488 4,568 2 February 3,632 2,983 2,902 3,053 3,347 4,527 3 March 3,909 3,166 3,217 3,175 3,636 4,594 4 April 3,903 3,258 3,146 3,023 3,709 4,574 5 May 4,075 3,234 3,266 3,033 3,603 4,511 6 June 4,038 3,272 3,316 2,909 3,057 4,081 7 July 3,661 3,359 3,062 3,354 4,215 8 August 3,942 3,417 3,077 3,395 4,139 9 September 3,703 3,169 3,095 3,100 3,752 10 October 3,727 3,469 3,179 3,375 3,874 11 November 3,722 3,145 3,159 3,213 3,567 12 December 3,866 3,251 3,199 3,324 3,362 13 Total 23,482 41,997 38,946 37,148 40,601 49,764 

I can convert the first column to an index using:

 In [55]: df.set_index([0]) Out[55]: 1 2 3 4 5 6 0 2013 2012 2011 2010 2009 2008 January 3,925 3,463 3,289 3,184 3,488 4,568 February 3,632 2,983 2,902 3,053 3,347 4,527 March 3,909 3,166 3,217 3,175 3,636 4,594 April 3,903 3,258 3,146 3,023 3,709 4,574 May 4,075 3,234 3,266 3,033 3,603 4,511 June 4,038 3,272 3,316 2,909 3,057 4,081 July 3,661 3,359 3,062 3,354 4,215 August 3,942 3,417 3,077 3,395 4,139 September 3,703 3,169 3,095 3,100 3,752 October 3,727 3,469 3,179 3,375 3,874 November 3,722 3,145 3,159 3,213 3,567 December 3,866 3,251 3,199 3,324 3,362 Total 23,482 41,997 38,946 37,148 40,601 49,764 

My question is how to convert the first row to column headers? The closest I can get is:

 In [53]: df.set_index([0]).rename(columns=df.loc[0]) Out[53]: 2013 2012 2011 2010 2009 2008 0 2013 2012 2011 2010 2009 2008 January 3,925 3,463 3,289 3,184 3,488 4,568 February 3,632 2,983 2,902 3,053 3,347 4,527 March 3,909 3,166 3,217 3,175 3,636 4,594 April 3,903 3,258 3,146 3,023 3,709 4,574 May 4,075 3,234 3,266 3,033 3,603 4,511 June 4,038 3,272 3,316 2,909 3,057 4,081 July 3,661 3,359 3,062 3,354 4,215 August 3,942 3,417 3,077 3,395 4,139 September 3,703 3,169 3,095 3,100 3,752 October 3,727 3,469 3,179 3,375 3,874 November 3,722 3,145 3,159 3,213 3,567 December 3,866 3,251 3,199 3,324 3,362 Total 23,482 41,997 38,946 37,148 40,601 49,764 

but then I need to go in and delete the first line.

+8
pandas
source share
2 answers

The best way to deal with this is to avoid getting into this situation.

How was df created? For example, if you used read_csv or a variant, then header=0 will tell read_csv to parse the first row as column names.


Given df , as you have, I don't think there is an easier way to fix this than what you described. To remove the first line, you can use df.iloc :

 df = df.iloc[1:] 
+3
source share

I'm not sure if this is more efficient, but you can try to create a data frame with a corect index and default column names from your task data frame, and then rename the columns using the proletalized data frame as well. For example:

 import pandas as pd import numpy as np from pandas import DataFrame data = {'0':[' ', 'Jan', 'Feb', 'Mar', 'April'], \ '1' : ['2013', 3926, 3456, 3245, 1254], \ '2' : ['2012', 3346, 4342, 1214, 4522], \ '3' : ['2011', 3946, 4323, 1214, 8922]} DF = DataFrame(data) DF2 = (DataFrame(DF.ix[1:, 1:]).set_index(DF.ix[1:,0])) DF2.columns = DF.ix[0, 1:] DF2 
+1
source share

All Articles