Frequency table, including zeros for unused values, on the data table.

I have a dataset that looks like this:

library(data.table) test <- data.table(structure(list(Issue.Date = structure(c(16041, 16056, 16042,15990, 15996, 16001, 15995, 15981, 15986, 15996, 15996, 16002,16015, 16020, 16025, 16032, 16023, 16084, 16077, 16102, 16104,16107, 16112, 16113, 16115, 16121, 16125, 16128, 16104, 16132,16133, 16135, 16139, 16146, 16151), class = "Date"), Complaint = structure(c(1L,4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,5L, 3L, 1L, 3L, 1L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 3L,3L, 3L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), yr = c("2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014"), Month = c("2013-12", "2013-12","2013-12", "2013-10", "2013-10", "2013-10", "2013-10", "2013-10","2013-10", "2013-10", "2013-10", "2013-10", "2013-11", "2013-11","2013-11", "2013-11", "2013-11", "2014-01", "2014-01", "2014-02","2014-02", "2014-02", "2014-02", "2014-02", "2014-02", "2014-02","2014-02", "2014-02", "2014-02", "2014-03", "2014-03", "2014-03","2014-03", "2014-03", "2014-03"), da = c("02", "17", "03","12", "18", "23", "17", "03", "08", "18", "18", "24", "06","11", "16", "23", "14", "14", "07", "01", "03", "06", "11","12", "14", "20", "24", "27", "03", "03", "04", "06", "10","17", "22")), .Names = c("Issue.Date", "Complaint", "yr","Month", "da"), class = c("data.table", "data.frame"), row.names = c(NA,-35L))) 

Basically, I would like to use data.table to create a frequency table that has Complaint and Count on Month . The trick is that I need it to show a Count zero if there is no Month for this type of Complaints . I know how to do this without showing zeros, but I want to know how to turn them on.

 test[ , count := .N, by = "Month,Complaint"] 
+6
source share
2 answers

Directly get calculations for each group:

 setkey(test, Month, Complaint) # may need to also add allow.cartesian, depending on actual data test[CJ(Month, Complaint, unique = TRUE), .N, by = .EACHI] # Month Complaint N # 1: 2013-10 A 0 # 2: 2013-10 B 0 # 3: 2013-10 C 5 # 4: 2013-10 D 4 # 5: 2013-10 E 0 # 6: 2013-11 A 1 # 7: 2013-11 B 0 # 8: 2013-11 C 4 # 9: 2013-11 D 0 #10: 2013-11 E 0 #11: 2013-12 A 1 #12: 2013-12 B 0 #13: 2013-12 C 0 #14: 2013-12 D 2 #15: 2013-12 E 0 #16: 2014-01 A 0 #17: 2014-01 B 0 #18: 2014-01 C 1 #19: 2014-01 D 0 #20: 2014-01 E 1 #21: 2014-02 A 2 #22: 2014-02 B 0 #23: 2014-02 C 6 #24: 2014-02 D 2 #25: 2014-02 E 0 #26: 2014-03 A 1 #27: 2014-03 B 2 #28: 2014-03 C 3 #29: 2014-03 D 0 #30: 2014-03 E 0 # Month Complaint N 

See the first revision of the answer if you want to have counters in the full data.table instead of summing.

+7
source

It looks like you might need to use expand.grid to β€œpopulate” your data.table :

 EG <- data.table(expand.grid(Complaint = unique(test$Complaint), Month = unique(test$Month)), key = "Complaint,Month") 

Then you can merge :

 setkey(test, Complaint, Month) Full <- merge(test, EG, all.y = TRUE) 

And count it like this:

 Full[ , list(sum(!is.na(Issue.Date))), by = "Month,Complaint"] # Month Complaint V1 # 1: 2013-11 A 1 # 2: 2013-12 A 1 # 3: 2014-02 A 2 # 4: 2014-03 A 1 # 5: 2013-10 A 0 # 6: 2014-01 A 0 # 7: 2013-11 B 0 # 8: 2013-12 B 0 # ::: SNIP ::: # 24: 2014-01 D 0 # 25: 2013-11 E 0 # 26: 2013-12 E 0 # 27: 2014-02 E 0 # 28: 2014-03 E 0 # 29: 2013-10 E 0 # 30: 2014-01 E 1 # Month Complaint V1 

Alternatively just use table (???)

 data.table(table(test[, c("Month", "Complaint"), with = FALSE])) # Month Complaint N # 1: 2013-10 A 0 # 2: 2013-11 A 1 # 3: 2013-12 A 1 # 4: 2014-01 A 0 # 5: 2014-02 A 2 # 6: 2014-03 A 1 # 7: 2013-10 B 0 # ::: SNIP ::: # 28: 2014-01 E 1 # 29: 2014-02 E 0 # 30: 2014-03 E 0 # Month Complaint N 
+4
source

All Articles