Using the CouchDB view, can I group groups and filter by a range of keys at the same time?

I am using CouchDB. I would like to be able to count the values ​​of the values ​​of certain fields in a date range that can be specified at the time of the request. I seem to be able to do parts of this, but I am having trouble understanding the best way to put it all together.

Assuming documents that have a timestamp field and another field, for example:

{ date: '20120101-1853', author: 'bart' } { date: '20120102-1850', author: 'homer'} { date: '20120103-2359', author: 'homer'} { date: '20120104-1200', author: 'lisa'} { date: '20120815-1250', author: 'lisa'} 

I can easily create a view in which filters documents using a flexible date range . This can be done with a view similar to the one below, with key range options, for example. _view/all-docs?startkey=20120101-0000&endkey=20120201-0000 .

all-docs / map.js:

 function(doc) { emit(doc.date, doc); } 

With the above data, this will return a CouchDB view containing only the first 4 documents (single documents in a date range).

I can also create a query in which it counts the occurrences of a given field , like this, is called with a grouping, i.e. _view/author-count?group=true :

Account author /map.js:

 function(doc) { emit(doc.author, 1); } 

Account author /reduce.js:

 function(keys, values, rereduce) { return sum(values); } 

This will give something like:

 { "rows": [ {"key":"bart","value":1}, {"key":"homer","value":2} {"key":"lisa","value":2} ] } 

However, I cannot find a better way, either by date or by account . For example, with the above data, I would like to specify the range parameters, such as startkey=20120101-0000&endkey=20120201-0000 , and get a result similar to this, where the last document is excluded from the calculation because it is outside the specified date range:

 { "rows": [ {"key":"bart","value":1}, {"key":"homer","value":2} {"key":"lisa","value":1} ] } 

What is the most elegant way to do this? Is this possible with a single request? Should I use a different CouchDB construct or is it enough for this view?

+8
couchdb mapreduce
source share
3 answers

You can get closer to the desired result with a list:

 { _id: "_design/authors", views: { authors_by_date: { map: function(doc) { emit(doc.date, doc.author); } } }, lists: { count_occurrences: function(head, req) { start({ headers: { "Content-Type": "application/json" }}); var result = {}; var row; while(row = getRow()) { var val = row.value; if(result[val]) result[val]++; else result[val] = 1; } return result; } } } 

This construct may be requested as such:

 http://<couchurl>/<db>/_design/authors/_list/count_occurrences/authors_by_date?startkey=<startDate>&endkey=<endDate> 

This will be slower than a normal map snapshot, and this is a bit of a workaround. Unfortunately, this is the only way to make a multidimensional query, "which CouchDB is not suitable for . "

The query result for this project will be something like this:

 { "bart": 1, "homer": 2, "lisa": 2 } 

What we do is basically emit a lot of elements and then use the list to group as we want. The list can be used to display the result in any way, but you will also often be slower. While a normal map snapshot can be cached and changed only according to differences, the list will need to be recreated every time it is requested.

This is almost as slow as getting all the elements received on the map (the overhead for organizing the data is mostly negligible): much slower than getting the reduction result.

If you want to use the list for another view, you can simply swap it at the URL you requested:

 http://<couchurl>/<db>/_design/authors/_list/count_occurrences/<view> 

Learn more about listings on the couchdb wiki page .

+1
source share

You need to create a combined view:

combined / map.js:

 function(doc) { emit([doc.date, doc.author], 1); } 

combined / reduce.js:

 _sum 

Thus, you can filter documents by start and end dates.

 startkey=[20120101-0000, "a"]&endkey=[20120201-0000, "a"] 
0
source share

Although your problem is difficult to solve in the general case, knowing some restrictions on possible queries may help. For example. if you know that you will search in ranges that will cover full days / months, you can use the [year, month, day, time] arrays instead of the string:

 emit([doc.date_year, doc.date_month, doc.date_day, doc.date_time, doc.author] doc); 

Even if you cannot predict that all possible queries will fit into a grouping based on this type of key, dividing the key can help you optimize your range queries and reduce the number of required queries (with the cost of extra space).

0
source share

All Articles