Fast, efficient way to remove rows from large Pandas DataFrames

I want to remove rows from a large Pandas DataFrame that contains analytics data based on the actions / events that users did on the website. All user action flows begin with an event startand end with an event end. I want to find all the users who made a particular event (for example, signed upindex 13 in the sample framework) and delete all the events after this event before (and including) the event end. Therefore, in this example viewed blog post, page view, visited site, ad campaign hit, viewed blog post, visited site, page viewand endthey must be removed from the data frame.

In [26]: data
Out[26]: 
    event            user
0   start            user1
1   visited blog     user1
2   page view        user1
3   visited blog     user1
4   viewed blog post user1
5   ad campaign hit  user1
6   page view        user1
7   visited site     user1
8   visited blog     user1
9   viewed blog post user1
10  visited site     user1
11  page view        user1
12  signed up        user1
13  viewed blog post user1
14  page view        user1
15  visited site     user1
16  ad campaign hit  user1
17  viewed blog post user1
18  visited site     user1
19  page view        user1
20  end              user1

- np.where()

removal_starts_at = data[(data.user == 'user1') & (data.event == 'signed up')]
removal_ends_at = data[(data.user == 'user1') & (data.event == 'end')]
data[data.user == 'user1'].drop(data.index[removal_start_at+1:removal_ends_at+1], inplace=True)

! ~ 20 . 1000 , . , , .

, , : [data.user == 'user1'] , . , SettingWithCopy.

Pandas, , . , , MultiIndex, , , ?

+4
2

, , . , 2 . , - :

df['keep'] = np.where( df['event'] == 'start', 1, np.nan )
df['keep'] = np.where( df['event'].shift() == 'signed up', 0, df['keep'] )
df['keep'] = df['keep'].ffill()

               event   user  keep
0              start  user1     1
1       visited blog  user1     1
2          page view  user1     1
3          signed up  user1     1
4   viewed blog post  user1     0
5          page view  user1     0
6                end  user1     0
7              start  user2     1
8       visited blog  user2     1
9          signed up  user2     1
10  viewed blog post  user2     0
11               end  user2     0

df[df['keep']==1]

          event   user  keep
0         start  user1     1
1  visited blog  user1     1
2     page view  user1     1
3     signed up  user1     1
7         start  user2     1
8  visited blog  user2     1
9     signed up  user2     1
+4

, , .

In [15]: idx = data.query('user=="user1" and event=="signed up"').index[0]

In [16]: data[:idx+1]
Out[16]: 
               event   user
0              start  user1
1       visited blog  user1
2          page view  user1
3       visited blog  user1
4   viewed blog post  user1
5    ad campaign hit  user1
6          page view  user1
7       visited site  user1
8       visited blog  user1
9   viewed blog post  user1
10      visited site  user1
11         page view  user1
12         signed up  user1
+1

All Articles