I want to remove rows from a large Pandas DataFrame that contains analytics data based on the actions / events that users did on the website. All user action flows begin with an event startand end with an event end. I want to find all the users who made a particular event (for example, signed upindex 13 in the sample framework) and delete all the events after this event before (and including) the event end. Therefore, in this example viewed blog post, page view, visited site, ad campaign hit, viewed blog post, visited site, page viewand endthey must be removed from the data frame.
In [26]: data
Out[26]:
event user
0 start user1
1 visited blog user1
2 page view user1
3 visited blog user1
4 viewed blog post user1
5 ad campaign hit user1
6 page view user1
7 visited site user1
8 visited blog user1
9 viewed blog post user1
10 visited site user1
11 page view user1
12 signed up user1
13 viewed blog post user1
14 page view user1
15 visited site user1
16 ad campaign hit user1
17 viewed blog post user1
18 visited site user1
19 page view user1
20 end user1
- np.where()
removal_starts_at = data[(data.user == 'user1') & (data.event == 'signed up')]
removal_ends_at = data[(data.user == 'user1') & (data.event == 'end')]
data[data.user == 'user1'].drop(data.index[removal_start_at+1:removal_ends_at+1], inplace=True)
! ~ 20 . 1000 , . , , .
, , :
[data.user == 'user1'] , . , SettingWithCopy.
Pandas, , . , , MultiIndex, , , ?