Starting with something like this:
from pandas import DataFrame time = np.array(('2015-08-01T00:00:00','2015-08-01T12:00:00'),dtype='datetime64[ns]') heat_index = np.array([101,103]) air_temperature = np.array([96,95]) df = DataFrame({'heat_index':heat_index,'air_temperature':air_temperature},index=time)
giving this for df :
air_temperature heat_index 2015-08-01 07:00:00 96 101 2015-08-01 19:00:00 95 103
and then try again:
df_daily = df.resample('24H',how='max')
To get this for df_daily :
air_temperature heat_index 2015-08-01 96 103
Thus, by re-sampling using how='max' pandas recalculates every 24-hour period, taking the maximum value for this period from each column.
But, as you can see the df conclusion for 2015-08-01 , this maximum heat index (which occurs at 19:00:00 ) does not correlate with the air temperature occurring simultaneously. That is, the heat index 103F was caused by an air temperature of 95F. This association is lost by resampling, and we end up looking at air temperature in another part of the day.
Is there a way to reprogram only one column and store the value in another column with the same index? To make the final result look like this:
air_temperature heat_index 2015-08-01 95 103
My first guess is to just redo the heat_index column ...
df_daily = df.resample('24H',how={'heat_index':'max'})
To obtain...
air_temperature 2015-08-01 103
... and then try to make some kind of DataFrame.loc or DataFrame.ix from there, but they were unsuccessful. Any thoughts on how to find the appropriate value after resampling (for example, find the air_temperature that happened at the same time as what is later considered the maximum heat_index )