Unpacking data using Pandas

I have some data that I take from โ€œlongโ€ to โ€œwideโ€. I have no problem using unstack to make the data wide, but then I get what looks like an index that I cannot get rid of. Here is an example:

 ## set up some dummy data import pandas as pd d = {'state' : ['a','b','a','b','a','b','a','b'], 'year' : [1,1,1,1,2,2,2,2], 'description' : ['thing1','thing1','thing1','thing2','thing2','thing2','thing1','thing2'], 'value' : [1., 2., 3., 4.,1., 2., 3., 4.]} df = pd.DataFrame(d) ## now that we have dummy data do the long to wide conversion dfGrouped = df.groupby(['state','year', 'description']).value.sum() dfUnstacked = dfGrouped.unstack('description') print dfUnstacked description thing1 thing2 state year a 1 4 NaN 2 3 1 b 1 2 4 2 NaN 6 

So it looks the way I expected. Now I would like to have an unindexed data frame with the column states', 'year', 'thing1', 'thing2'. Therefore, it seems to me that I should do this:

 dfUnstackedNoIndex = dfUnstacked.reset_index() print dfUnstackedNoIndex description state year thing1 thing2 0 a 1 4 NaN 1 a 2 3 1 2 b 1 2 4 3 b 2 NaN 6 

Ok, this is close. But I do not want the description to be carried forward. So let me select only those columns that I want:

 print dfUnstackedNoIndex[['state','year','thing1','thing2']] description state year thing1 thing2 0 a 1 4 NaN 1 a 2 3 1 2 b 1 2 4 3 b 2 NaN 6 

So what about the "description"? Why is this happening, although I reset the index and selected only a few columns? Itโ€™s clear that Iโ€™m not going to do something right.

FWIW, my version of Pandas is 0.12

+7
python pandas
source share
1 answer

description is the name of the columns. You can get rid of this:

 In [74]: dfUnstackedNoIndex.columns.name = None In [75]: dfUnstackedNoIndex Out[75]: state year thing1 thing2 0 a 1 4 NaN 1 a 2 3 1 2 b 1 2 4 3 b 2 NaN 6 

The assignment of column names may become clearer when you look at what happens when you double pull together:

 In [107]: dfUnstacked2 = dfUnstacked.unstack('state') In [108]: dfUnstacked2 Out[108]: description thing1 thing2 state abab year 1 4 2 NaN 4 2 3 NaN 1 6 

Now dfUnstacked2.columns is MultiIndex . Each level has a name that corresponds to the name of the index level that has been converted to the column level.

 In [111]: dfUnstacked2.columns Out[111]: MultiIndex(levels=[[u'thing1', u'thing2'], [u'a', u'b']], labels=[[0, 0, 1, 1], [0, 1, 0, 1]], names=[u'description', u'state']) 

Column names and index names appear in the same place in the row representation of DataFrames, so it can be difficult to know what exactly. You can figure this out by checking df.index.names and df.columns.names .

+6
source share

All Articles