Introduction
Your goal here requires a little reflection on considerations about when to record events, how you structured them into aggregation data of a time period. The obvious point is that one document that you imagine can actually represent events that will appear in “multiple” time periods in the final aggregated result.
Thus, analysis is a problem that goes beyond the structure of aggregation due to the time periods that appear to be. Some events should be “generated” beyond what you can simply group, what you should see.
To do this "generataion", you need mapReduce . This has "flow control" via JavaScript as a processing language, in order to be able to substantially determine whether more than one period has passed between on / off and therefore emits data that has occurred in more than one of these periods.
As a side note, “light” is probably not suitable for _id
, as it can be turned on / off many times during a given day. Thus, an “on” / off instance is most likely better. However, I just follow your example, therefore, to transform this, simply replace the _id
link in the cartographer’s code with the fact that the actual field is the light identifier.
But to the code:
Map and abbreviation
In essence, the "mapper" function scans the current record, rounds each on / off time to hours, and then displays the start hour from which the event occurred in six hours.
With these new date values, the loop starts to take the initial time “on” and emit an event for the current “light” turned on during this period, within the same array of elements, as explained below. Each cycle increases the initial period by six hours until a "bright" time is reached.
They appear in the function of the reducer, which requires the same expected input that it will return, therefore, therefore, the array of lights is included in the period inside the value object. It processes the emitted data under the same key as the list of these value objects.
First, we sort through the list of values that need to be reduced, and then look at the internal array of lights, which could have come from the previous pass with a decrease, and processing each of them as a unique array of unique lights. Just do this by looking at the current light value in the results array and clicking on this array where it does not exist.
Pay attention to the “previous pass”, as if you are not familiar with how mapReduce works, then you should understand that the reducer function itself emits a result that may not have been achieved by processing “all” possible values for the “key” in one run. It can and often processes only a "subset" of emitted data for a key and therefore accepts a "reduced" result as an input signal in the same way that data is emitted from a display device.
This design point is why both the cartographer and the gearbox need to output data with the same structure as the gearbox itself can also get it from data that was previously reduced. This is how mapReduce deals with large datasets emitting a large number of identical key values. It is usually processed in "pieces", and not immediately.
The end reduction is reduced to a list of lights on during the period with each start and end period as an emitted key. Like this:
{ "_id": { "start": ISODate("2015-01-01T06:00:00Z"), "end": ISODate("2015-01-01T12:00:00Z") }, { "result": { "lights_on": [ "light_1", "light_2" ] } } },
This structure "_id", "result" is just a property of how the entire mapReduce output is displayed, but all the values you need.
Query
Now there is also a note on the choice of request, which should take into account that the light can already be "on" through its collection record on a date before the start of the current day. The same can be said that it can be turned off after the current date is sent, and can actually either be null
or not turn off in the document, depending on how your data is stored and what day is actually observed.
This logic creates some required calculation from the beginning of the day for the message and considers a six-hour period both before and after this date with the specified query conditions:
{ "on": { "$gte": yesterday, "$lt": tomorrow }, "$or": [ { "off": { "$gte:" today, "$lt": nextday } }, { "off": null }, { "off": { "$exists": false } } ] }
The base selectors use the range operators $gte
and $lt
to find values that are greater than or equal to and less than the fields that they check the values to find data in a suitable range.
As part of $or
, various options are considered for the value of "off". It’s either that it falls into the criteria of the range, either it is null
or there may be no key at all in the document through $exists
. It depends on how you actually imagine “turned off” when the light is not yet turned off regarding the requirements of these conditions within $or
, but that would be reasonable assumptions.
Like all MongoDB queries, all conditions are an implicit AND expression, unless otherwise specified.
This is still somewhat erroneous depending on how long it is expected that the light will be on. But the variables are all intentionally listed externally to adjust your needs, taking into account the expected duration to receive, either before or after the date that will be reported.
Create empty time series
Another note here is that the data itself probably will not have any events that show the light turned on for a given period of time. For this reason, there is a simple method built into the mapper function that looks to see if we are at the first iteration of the results.
Only for the first time a set of possible period keys is issued, which includes an empty array for the lights on in each period. This allows the report to also show those periods when no light was turned on at all, as it is inserted into the data sent to the gearbox and output.
You may vary depending on this approach, as it still depends on the availability of some data that meets the query criteria in order to output anything. Therefore, in order to serve a truly "empty day" when data is not being recorded or does not meet the criteria, then it would be better to create an external key hash table, all showing an empty result for the lights. Then simply merge the result of the mapReduce operation into those existing keys to create the report.
Summary
There are several calculations in the dates, and, unaware of the real implementation of the final language, I simply declare something that works externally with the actual mapReduce operation separately. So, everything that looks like duplication here is done for this purpose, making this part of the logical language independent. Most programming languages support the ability to manipulate dates according to the methods used.
The inputs, which are then specific to each language, are passed as an option block, shown as the last argument to the mapReduce method. It is noteworthy that there is a query with its parameterized values, which are all calculated from the date that you want to report. Then there is a “region”, which is a way to convey values that can be used by functions in the mapReduce operation.
With all of this in mind, the JavaScript code and JavaScript reducer code remains unchanged, as this is what the method expects as input. Any variables for the process are submitted both by volume and by the query results in order to get the result without changing this code.
Mainly because the duration of the “light on” can span different periods, which should be reported as becoming what the aggregation structure cannot create. It cannot perform the “loop” and “data throw” that are needed to get the result, and therefore we use mapReduce instead for this task.
However, the big question. I do not know if you have considered the concepts of how to achieve results here already, but at least now there is a guide for someone approaching a similar problem.