Great question. Here are two ways. They both use by-without-by.
DT = as.data.table(df) setkey(DT,categories,groups) DT[CJ(unique(categories),unique(groups)), sum(values,na.rm=TRUE)] categories groups V1 1: AX 2 2: AY 1 3: AZ 0 4: BX 1 5: BY 2 6: BZ 0 7: CX 1 8: CY 1 9: CZ 1
where CJ stands for Cross Join, see ?CJ . by-without-by simply means that j is executed in each group to which each row i joins.
In truth, it looks complicated at first sight. The idea is that if you have a well-known subset of groups, this syntax is faster than grouping everything and then selecting only the results that you need. But in this case, you still want to have few advantages besides the ability to search for groups that do not exist in the data (which you cannot do with by ).
Another way is by first as usual, then attach the result of CJ() to this:
DT[,sum(values),keyby='categories,groups'][CJ(unique(categories),unique(groups))] categories groups V1 1: AX 2 2: AY 1 3: AZ NA 4: BX 1 5: BY 2 6: BZ NA 7: CX 1 8: CY 1 9: CZ 1
but then you get NA instead of the desired 0. If necessary, you can replace it with set() . The second way could be faster, because the two unique calls are provided with much smaller input.
Both methods can be wrapped in small helper functions if you do this a lot.
Matt dowle
source share