I am currently formatting data from two different data sets. One of the data sets reflects the number of observations of people in the room by the hour, the second - the number of people based on wifi logs generated in 5 minutes.
After combining these two data frames into one, I ran into a problem when every hour (like "10:00:00") has data from the original set, but other data (every 5 minutes, like "10:47: 14") is not includes this data.
Here's what the merge file looks like:
room time con auth capacity % Count module size 0 B002 Mon Nov 02 10:32:06 23 23 90 NaN NaN NaN NaN` 1 B002 Mon Nov 02 10:37:10 25 25 90 NaN NaN NaN NaN` 12527 B002 Mon Nov 02 10:00:00 NaN NaN 90 50% 45.0 COMP30520 60` 12528 B002 Mon Nov 02 11:00:00 NaN NaN 90 0% 0.0 COMP30520 60`
Do I have a way to go through the framework and find all the information about "employment", "OccupancyCount", "module" and "size" from 11:00:00 and write it to all the cells that are on the same day and where is the hour between 10:00:00 and 10:59:59?
This will allow me to have all the information on each line, and then allow me to collect min() , max() and median() based on the "day" and "hour".
To answer the comment for the source data, there is:
first data frame:
time room module size 0 Mon Nov 02 09:00:00 B002 COMP30190 29 1 Mon Nov 02 10:00:00 B002 COMP40660 53
second frame:
room time con auth capacity % Count 0 B002 Mon Nov 02 20:32:06 0 0 NaN NaN NaN 1 B002 Mon Nov 02 20:37:10 0 0 NaN NaN NaN 2 B002 Mon Nov 02 20:42:12 0 0 NaN NaN NaN 12797 B008 Wed Nov 11 13:00:00 NaN NaN 40 25 10.0 12798 B008 Wed Nov 11 14:00:00 NaN NaN 40 50 20.0 12799 B008 Wed Nov 11 15:00:00 NaN NaN 40 25 10.0
thus, these two data blocks were combined together:
DFinal = pd.merge(DF, d3, left_on=["room", "time"], right_on=["room", "time"], how="outer", left_index=False, right_index=False)
Any help with this would be greatly appreciated.
Thank you very much,
-Romain