Multiindex are just some of the columns in Pandas

I have a csv that is generated in a format that I cannot change. The file has a multi-index. The file is as follows.

enter image description here

The ultimate goal is to turn the top row (hours) into an index and index it with an “ID” column so that the data looks like this.

enter image description here

I imported the file in pandas ...

myfile = 'c:/temp/myfile.csv'
df = pd.read_csv(myfile, header=[0, 1], tupleize_cols=True)
pd.set_option('display.multi_sparse', False)
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['hour', 'field'])
df

But this gives me three unnamed fields:

enter image description here

My last step is to add per hour:

df.stack(level=['hour'])

But I skip that before that, where I can index other columns, even if there is a row with several integer indices.

+4
source share
1 answer

, , , # 3 4:

df = pd.io.parsers.read_csv('temp.csv', header = [0,1], tupleize_cols = True)
df.columns = [c for _, c in df.columns[:3]] + [c for c in df.columns[3:]]
df = df.set_index(list(df.columns[:3]), append = True)
df.columns = pd.MultiIndex.from_tuples(df.columns, names = ['hour', 'field'])
  • , 3 col. .
  • , .

stack reset , .

.

  (Unnamed: 0_level_0, Date)  (Unnamed: 1_level_0, id)  \
0                  3/11/2016                         5   
1                  3/11/2016                         6   

  (Unnamed: 2_level_0, zone)  (100, p1)  (100, p2)  (200, p1)  (200, p2)  
0                        abc      0.678      0.787      0.337      0.979  
1                        abc      0.953      0.559      0.776      0.520  

field                        p1     p2
  Date      id zone hour              
0 3/11/2016 5  abc  100   0.678  0.787
                    200   0.337  0.979
1 3/11/2016 6  abc  100   0.953  0.559
                    200   0.776  0.520
+5

All Articles