Frequency table, including zeros for unused values, on the data table.

Question

Frequency table, including zeros for unused values, on the data table.

I have a dataset that looks like this:

library(data.table) test <- data.table(structure(list(Issue.Date = structure(c(16041, 16056, 16042,15990, 15996, 16001, 15995, 15981, 15986, 15996, 15996, 16002,16015, 16020, 16025, 16032, 16023, 16084, 16077, 16102, 16104,16107, 16112, 16113, 16115, 16121, 16125, 16128, 16104, 16132,16133, 16135, 16139, 16146, 16151), class = "Date"), Complaint = structure(c(1L,4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,5L, 3L, 1L, 3L, 1L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 3L,3L, 3L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), yr = c("2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014"), Month = c("2013-12", "2013-12","2013-12", "2013-10", "2013-10", "2013-10", "2013-10", "2013-10","2013-10", "2013-10", "2013-10", "2013-10", "2013-11", "2013-11","2013-11", "2013-11", "2013-11", "2014-01", "2014-01", "2014-02","2014-02", "2014-02", "2014-02", "2014-02", "2014-02", "2014-02","2014-02", "2014-02", "2014-02", "2014-03", "2014-03", "2014-03","2014-03", "2014-03", "2014-03"), da = c("02", "17", "03","12", "18", "23", "17", "03", "08", "18", "18", "24", "06","11", "16", "23", "14", "14", "07", "01", "03", "06", "11","12", "14", "20", "24", "27", "03", "03", "04", "06", "10","17", "22")), .Names = c("Issue.Date", "Complaint", "yr","Month", "da"), class = c("data.table", "data.frame"), row.names = c(NA,-35L)))

Basically, I would like to use data.table to create a frequency table that has Complaint and Count on Month . The trick is that I need it to show a Count zero if there is no Month for this type of Complaints . I know how to do this without showing zeros, but I want to know how to turn them on.

 test[ , count := .N, by = "Month,Complaint"]

+6

r data.table frequency-distribution

black_sheep07 May 08 '14 at 16:18

source share

2 answers

It looks like you might need to use expand.grid to “populate” your data.table :

 EG <- data.table(expand.grid(Complaint = unique(test$Complaint), Month = unique(test$Month)), key = "Complaint,Month")

Then you can merge :

 setkey(test, Complaint, Month) Full <- merge(test, EG, all.y = TRUE)

And count it like this:

 Full[ , list(sum(!is.na(Issue.Date))), by = "Month,Complaint"] # Month Complaint V1 # 1: 2013-11 A 1 # 2: 2013-12 A 1 # 3: 2014-02 A 2 # 4: 2014-03 A 1 # 5: 2013-10 A 0 # 6: 2014-01 A 0 # 7: 2013-11 B 0 # 8: 2013-12 B 0 # ::: SNIP ::: # 24: 2014-01 D 0 # 25: 2013-11 E 0 # 26: 2013-12 E 0 # 27: 2014-02 E 0 # 28: 2014-03 E 0 # 29: 2013-10 E 0 # 30: 2014-01 E 1 # Month Complaint V1

Alternatively just use table (???)

 data.table(table(test[, c("Month", "Complaint"), with = FALSE])) # Month Complaint N # 1: 2013-10 A 0 # 2: 2013-11 A 1 # 3: 2013-12 A 1 # 4: 2014-01 A 0 # 5: 2014-02 A 2 # 6: 2014-03 A 1 # 7: 2013-10 B 0 # ::: SNIP ::: # 28: 2014-01 E 1 # 29: 2014-02 E 0 # 30: 2014-03 E 0 # Month Complaint N

+4

A5C1D2H2I1M1N2O1R2T1 May 08 '14 at 16:31

source share

eddi · Accepted Answer · 2014-05-08T16:40:29+0000

Directly get calculations for each group:

 setkey(test, Month, Complaint) # may need to also add allow.cartesian, depending on actual data test[CJ(Month, Complaint, unique = TRUE), .N, by = .EACHI] # Month Complaint N # 1: 2013-10 A 0 # 2: 2013-10 B 0 # 3: 2013-10 C 5 # 4: 2013-10 D 4 # 5: 2013-10 E 0 # 6: 2013-11 A 1 # 7: 2013-11 B 0 # 8: 2013-11 C 4 # 9: 2013-11 D 0 #10: 2013-11 E 0 #11: 2013-12 A 1 #12: 2013-12 B 0 #13: 2013-12 C 0 #14: 2013-12 D 2 #15: 2013-12 E 0 #16: 2014-01 A 0 #17: 2014-01 B 0 #18: 2014-01 C 1 #19: 2014-01 D 0 #20: 2014-01 E 1 #21: 2014-02 A 2 #22: 2014-02 B 0 #23: 2014-02 C 6 #24: 2014-02 D 2 #25: 2014-02 E 0 #26: 2014-03 A 1 #27: 2014-03 B 2 #28: 2014-03 C 3 #29: 2014-03 D 0 #30: 2014-03 E 0 # Month Complaint N

See the first revision of the answer if you want to have counters in the full data.table instead of summing.

Frequency table, including zeros for unused values, on the data table.

More articles: