To answer your question, yes, this is the most effective way. But I think we need to find out why this is so.
As suggested in the alternatives, one thing that people look at is “sorting” your results before going to the $group stage, and what they are looking for is the timestamp value, so you want to make sure everything is in the "timestamp" order, hence the form:
db.temperature.aggregate([ { "$sort": { "station": 1, "dt": -1 } }, { "$group": { "_id": "$station", "result": { "$first":"$dt"}, "t": {"$first":"$t"} }} ])
And as stated, you certainly want the index to reflect this in order to make sorting efficient:
However, this is the real point. What, apparently, was missed by others (if not so for themselves), is that all this data is likely to be inserted already in time, since each reading is recorded as added.
Thus, the beauty of this _id field (with _id by default) is already in the "timestamp" order, since it itself contains the time value, and this makes the statement
db.temperature.aggregate([ { "$group": { "_id": "$station", "result": { "$last":"$dt"}, "t": {"$last":"$t"} }} ])
And this faster. What for? Well, you do not need to select an index (an additional code to call), you also do not need to “load” the index in addition to the document.
We already know that the documents are in order (on _id ), so the $last borders are perfectly valid. You scan everything all the same, and you can also “set” the query by _id values ​​as equally valid for two dates.
The only thing that can be said here is that in the "real world" use may be more practical for you $match between date ranges when doing this kind of accumulation as opposed to getting the "first" and "last" _id to define the "range" or something similar in your actual use.
So where is the evidence? Well, this is pretty easy to reproduce, so I just did it by creating some sample data:
var stations = [ "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY" ]; for ( i=0; i<200000; i++ ) { var station = stations[Math.floor(Math.random()*stations.length)]; var t = Math.floor(Math.random() * ( 96 - 50 + 1 )) +50; dt = new Date(); db.temperatures.insert({ station: station, t: t, dt: dt }); }
On my equipment (an 8 GB laptop with a backrest drive that is not stellar, but certainly adequate) that performs each form of instruction, it clearly shows a noticeable pause with the version using index and sorting (the same keys by index as sorting expression). This is only a slight pause, but the difference is significant enough to notice.
Even looking at the conclusion of the explanation (version 2.6 and higher or actually present in 2.4.9, although not documented), you can see the difference in this, although $sort optimized due to the presence of the index, a time that seems to be refers to selecting an index and then loading indexed records. Enabling all fields for a "private" index query does not matter.
Also, for recording, purely indexing the date and only sorting by date values ​​give the same result. Perhaps a little faster, but still slower than the natural form of the index without sorting.
So, as long as you can happily "vary" from the first and last _id values, it is true that using a natural index in insertion order is actually the most efficient way to do this. Your actual mileage in the world may vary depending on how practical it is for you or not, and it may just be more convenient to implement the index and sort by date.
But if you were happy with using the _id ranges or more than the "last" _id in your query, then maybe one setting to get the values ​​along with your results so that you can actually store and use this information in subsequent queries:
db.temperature.aggregate([
And if you actually “followed” such results, you can determine the maximum ObjectId value from your results and use it in the next query.
In any case, enjoy playing with this, but again, Yes, in this case, this query is the fastest way.