Pandas - grouping columns with conditions from another column

I am struggling with pandas regarding how to group multiple column values ​​with conditions:

This is how my data looks like a pandas dataframe:

id      trigger     timestamp
1       started     2017-10-01 14:00:1
1       ended       2017-10-04 12:00:1
2       started     2017-10-02 10:00:1
1       started     2017-10-03 11:00:1
2       ended       2017-10-04 12:00:1    
2       started     2017-10-05 15:00:1
1       ended       2017-10-05 16:00:1
2       ended       2017-10-05 17:00:1

My goal is to find the difference in day / hour or minute between dates grouped by ID.

My output should look larger (diff in hrs):

id      trigger     timestamp           trigger     timestamp               diff
1       started     2017-10-01 14:00:1  ended       2017-10-04 12:00:1      70
1       started     2017-10-03 11:00:1  ended       2017-10-05 16:00:1      53
2       started     2017-10-02 10:00:1  ended       2017-10-04 12:00:1      26
2       started     2017-10-05 15:00:1  ended       2017-10-05 17:00:1      2

I tried many options, but I can not find the most effective solution.

Here is my code so far:

First I tried to break the data into 'start' and 'ended':

df['started'] = df.groupby(['id', 'timestamp'])['trigger'] == 'started'

df['ended'] = df.groupby(['id', 'timestamp'])['trigger'] == 'ended'

and then:

df.groupby(['id', 'started', 'ended'], as_index=True).sum()

but it does not work. or

df['started'] = df.groupby(['trigger'])['timestamp'].np.where(df['trigger']=='started')

also without bowel results.

, pandas? , df.fillna(method='ffill') NaN .

.

+6
1
  • id trigger
  • , . df MultiIndex
  • unstack on timestamp

df['timestamp'] = pd.to_datetime(df['timestamp']) # if necessary

i = df.groupby(['id', 'trigger']).cumcount()
df.set_index(['id', i, 'trigger']).timestamp.unstack().assign(
       diff=lambda d: d.ended.sub(d.started).dt.total_seconds() / 3600
)

piRSquared .

v

                  timestamp                      diff
trigger               ended             started      
id                                                   
1  0    2017-10-04 12:00:01 2017-10-01 14:00:01  70.0
   1    2017-10-05 16:00:01 2017-10-03 11:00:01  53.0
2  0    2017-10-04 12:00:01 2017-10-02 10:00:01  50.0
   1    2017-10-05 17:00:01 2017-10-05 15:00:01   2.0

, , MultiIndex .

+8

All Articles