How to filter empty groups from reduction?

It seems to me that Crossfilter never excludes a group from the reduction results, even if the filters applied exclude all rows in this group. Groups that have all their rows filtered out simply return an aggregated value of 0 (or reduceInitial returns reduceInitial ).

The problem is that it does not distinguish between groups that do not contain strings and groups that contain strings, but only legally aggregate with a value of 0. In principle, there is no way (as I see) to distinguish between a null value and an aggregation of 0.

Does anyone know about Crossfilter's built-in technique to achieve this? I came up with a way to do this using my own reduceInitial/reduceAdd/reduceRemove , but it was not completely direct, and it seemed to me that this behavior, which may / should be more native to the Crossfilter filtering semantics. So I wonder if there is a canonical way to achieve this.

I will send my technique as an answer if it turns out that there is no built-in way to do this.

+6
source share
2 answers

A simple way to achieve this is to reduce the number and total number of attributes:

 var dimGroup = dim.group().reduce(reduceAdd, reduceRemove, reduceInitial); function reduceAdd(p, v) { ++p.count; p.total += v.value; return p; } function reduceRemove(p, v) { --p.count; p.total -= v.value; return p; } function reduceInitial() { return {count: 0, total: 0}; } 

Empty groups will have zero values, so getting only non-empty groups is easy:

 dimGroup.top(Infinity).filter(function(d) { return d.value.count > 0; }); 
+6
source

Well, it seems that there is no obvious answer that jumps out, so I will answer my question and post the technique that I used to solve this problem.

This example assumes that I have already created a dimension and grouping, which is passed as groupDim . Since I want to be able to summarize any arbitrary numeric field, I also pass fieldName so that it is available in the closing area of ​​my reduction functions.

One of the important characteristics of this method is that it relies on the fact that there is a way to uniquely identify which group each row belongs to. Thinking about OLAP terms, this is essentially a β€œtuple” that defines the specific context of aggregation. But it can be anything if it deterministically returns the same value for all rows of data belonging to this group.

The end result is that empty groups will have a cumulative null value that can be easily detected and filtered out after the fact. Any group with at least one line will have a numerical value (even if it turns out to be zero).

Clarifications or suggestions to this are more welcome. Here is the code with inline comments:

 function configureAggregateSum(groupDim, fieldName) { function getGroupKey(datum) { // Given datum return key corresponding to the group to which the datum belongs } // This object will keep track of the number of times each group had reduceAdd // versus reduceRemove called. It is used to revert the running aggregate value // back to "null" if the count hits zero. This is unfortunately necessary because // Crossfilter filters as it is aggregating so reduceAdd can be called even if, in // the end, all records in a group end up being filtered out. // var groupCount = {}; function reduceAdd(p, v) { // Here the code that keeps track of the invocation count per group var groupKey = getGroupKey(v); if (groupCount[groupKey] === undefined) { groupCount[groupKey] = 0; } groupCount[groupKey]++; // And here the implementation of the add reduction (sum in my case) // Note the check for null (our initial value) var value = +v[fieldName]; return p === null ? value : p + value; } function reduceRemove(p, v) { // This code keeps track of invocations of invocation count per group and, importantly, // reverts value back to "null" if it hits 0 for the group. Essentially, if we detect // that group has no records again we revert to the initial value. var groupKey = getGroupKey(v); groupCount[groupKey]--; if (groupCount[groupKey] === 0) { return null; } // And here the code for the remove reduction (sum in my case) var value = +v[fieldName]; return p - value; } function reduceInitial() { return null; } // Once returned, can invoke all() or top() to get the values, which can then be filtered // using a native Array.filter to remove the groups with null value. return groupedDim.reduce(reduceAdd, reduceRemove, reduceInitial); } 
0
source

All Articles