You can groupby/tail(2) take the last 2 elements, then groupby/head(1) take the first element from the tail:
df.groupby(['A','B']).tail(2).groupby(['A','B']).head(1)
If there is only one element in the group, tail(2) returns only one element.
For example,
import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(10, size=(10**2, 3)), columns=list('ABC')) result = df.groupby(['A','B']).tail(2).groupby(['A','B']).head(1) expected = (df.groupby(['A', 'B'], as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[[-2]]).reset_index(level=0, drop=True)) assert expected.sort_index().equals(result)
Built-in group methods (such as tail and head ) are often much faster than groupby/apply with custom Python functions. This is especially true if there are many groups:
In [96]: %timeit df.groupby(['A','B']).tail(2).groupby(['A','B']).head(1) 1000 loops, best of 3: 1.7 ms per loop In [97]: %timeit (df.groupby(['A', 'B'], as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[[-2]]).reset_index(level=0, drop=True)) 100 loops, best of 3: 17.9 ms per loop
As an alternative, ayhan offers a nice improvement:
alt = df.groupby(['A','B']).tail(2).drop_duplicates(['A','B']) assert expected.sort_index().equals(alt) In [99]: %timeit df.groupby(['A','B']).tail(2).drop_duplicates(['A','B']) 1000 loops, best of 3: 1.43 ms per loop
unutbu
source share