Re-fetch in Pandas while maintaining value associations

Question

Re-fetch in Pandas while maintaining value associations

Starting with something like this:

from pandas import DataFrame time = np.array(('2015-08-01T00:00:00','2015-08-01T12:00:00'),dtype='datetime64[ns]') heat_index = np.array([101,103]) air_temperature = np.array([96,95]) df = DataFrame({'heat_index':heat_index,'air_temperature':air_temperature},index=time)

giving this for df :

  air_temperature heat_index 2015-08-01 07:00:00 96 101 2015-08-01 19:00:00 95 103

and then try again:

 df_daily = df.resample('24H',how='max')

To get this for df_daily :

  air_temperature heat_index 2015-08-01 96 103

Thus, by re-sampling using how='max' pandas recalculates every 24-hour period, taking the maximum value for this period from each column.

But, as you can see the df conclusion for 2015-08-01 , this maximum heat index (which occurs at 19:00:00 ) does not correlate with the air temperature occurring simultaneously. That is, the heat index 103F was caused by an air temperature of 95F. This association is lost by resampling, and we end up looking at air temperature in another part of the day.

Is there a way to reprogram only one column and store the value in another column with the same index? To make the final result look like this:

  air_temperature heat_index 2015-08-01 95 103

My first guess is to just redo the heat_index column ...

 df_daily = df.resample('24H',how={'heat_index':'max'})

To obtain...

  air_temperature 2015-08-01 103

... and then try to make some kind of DataFrame.loc or DataFrame.ix from there, but they were unsuccessful. Any thoughts on how to find the appropriate value after resampling (for example, find the air_temperature that happened at the same time as what is later considered the maximum heat_index )

+5

python pandas datetime

csg2136 Aug 12 '15 at 10:46

source share

1 answer

chrisb · Accepted Answer · 2015-08-12T23:29:04+0000

Here one of the ways - .groupby(TimeGrouper()) is what resample does, then the aggregation function filters each group to the maximum observation.

 In [60]: (df.groupby(pd.TimeGrouper('24H')) .agg(lambda df: df.loc[df['heat_index'].idxmax(), :])) Out[60]: air_temperature heat_index 2015-08-01 95 103

Re-fetch in Pandas while maintaining value associations

More articles: