Pandas MultiIndex Index Retention Group

After research, I did not find similar questions on this or any other forum.

I group the MultiIndex framework by its internal level. The thing is, after grouping, I still want to know what the "selected values" were for this internal index.

So I have something like

df = pd.DataFrame([['A', 1, 3],
                   ['A', 2, 4],
                   ['A', 3, 6],
                   ['B', 1, 9],
                   ['B', 2, 10],
                   ['B', 4, 6]],
                  columns=pd.Index(['Name', 'Date', 'Value'], name='ColumnName')
                 ).set_index(['Name', 'Date'])

ColumnName         Value
Name    Date
A        1           3
         2           4
         3           6 
B        1           9
         2           10
         4           6

I wanted

ColumnName         Value
Name    Date
A        3           6
B        4           6

What I could do was use this command:

df.groupby(level=('Name')).last()

retrieved this:

ColumnName         Value
Name    
A                    6
B                    6

Or using this command:

df.groupby(level=('Name','Date')).last()

error extraction.

Keep in mind that this is a performance sensitive application.

Thoughts?

EDIT: Meanwhile, I sent a function request to GitHub

+4
source share
3 answers

tail(1) last() groupby, :

In [22]: df.groupby(level='Name').tail(1)
Out[22]:
ColumnName  Value
Name Date
A    3          6
B    4          6

, tail "", ( , ). last , , NaN , .


OLD ANSWER ( last): , groupby, , , :

In [44]: df.reset_index(level='Date').groupby(level=0).last()
Out[44]:
ColumnName  Date  Value
Name
A              3      6
B              4      6

:

In [46]: df.reset_index(level='Date').groupby(level=0).last().set_index('Date', append=True)
Out[46]:
ColumnName  Value
Name Date
A    3          6
B    4          6

, , groupby dataframe:

In [96]: %timeit get_slice(df)
1000 loops, best of 3: 879 ยตs per loop

In [97]: %timeit df.reset_index(level='Date').groupby(level='Name').last().set_index('Date', append=True)
100 loops, best of 3: 3.75 ms per loop

In [220]: %timeit df.groupby(level='Name').tail(1)
1000 loops, best of 3: 1.04 ms per loop

dataframe, ( last ):

In [83]: df1 = pd.DataFrame(
             {'Value':np.random.randint(100, size=len(string.letters)*100)}, 
             index=pd.MultiIndex.from_product([list(string.letters), range(100)],
                                              names=['Name', 'Date']))

In [84]: df1
Out[84]:
           Value
Name Date
a    0        13
     1         9
     2        11
     3        16
...          ...
Z    96       15
     97       20
     98       40
     99       91

[5200 rows x 1 columns]

In [85]: %timeit get_slice(df1)
100 loops, best of 3: 3.24 ms per loop

In [86]: %timeit df1.reset_index(level='Date').groupby(level='Name').last().set_index('Date', append=True)
100 loops, best of 3: 4.69 ms per loop

In [218]: %timeit df1.groupby(level='Name').tail(1)
1000 loops, best of 3: 1.66 ms per loop

, , .

+4

:

def get_slice(df):
    l0, l1 = df.index.levels
    b0, b1 = df.index.labels

    n = len(l0)
    myslice = range(n)

    for i in myslice:
        myslice[i] = (l0[i], l1[b1[b0 == i][-1]])

    return df.loc[myslice]

%%timeit
get_slice(df)

1000 loops, best of 3: 458 ยตs per loop
+1

Try it: reset_index()

df = pd.DataFrame([['A', 1, 3],
                   ['A', 2, 4],
                   ['A', 3, 6],
                   ['B', 1, 9],
                   ['B', 2, 10],
                   ['B', 4, 6]],
                  columns=pd.Index(['Name', 'Date', 'Value'], name='ColumnName')
                 ).set_index(['Name', 'Date'])

df = df.reset_index()
df2 = df.groupby(["Name"])["Name","Date","Value" ].last()
df2.set_index(['Name', 'Date'], inplace=True)

#            Value
# Name Date       
# A    3         6
# B    4         6
+1
source

All Articles