The question I have is related to the one I posted some time ago here
Jaap was fantastic because it helped me create this wonderful output of pivot tables of counts and frequencies (in percent) of categorical variables.
The βreal dataβ that I am analyzing is two different hospitals, each of which has a different frequency of drug use, but not always the same drugs.
The summary of the Jaap func function from here is as follows, but in general the data.frame shown below (hospitals number one and two):
id AB1 AB2 AB3 AB4 AB5 AB6 AB7 AB8 AB9 AB10 AB11 AB12 AB13 total perc 1 1st gen Cephalosporin 4 0 0 1 1 0 0 0 0 0 0 0 0 6 1.9 2 3rd gen Cephalosporin 44 7 8 1 3 2 0 0 0 0 0 0 0 65 20.5 3 4th gen Cephalosporin 3 3 0 1 2 1 0 0 0 0 0 0 0 10 3.2
Now I would like to run chisq.test (or Fisher's , if the frequency is below 5) of the entire variable (names) found in the id column, using the total frequency found in the total column, by comparing hospital one against hospital two.
So, in layman's terms, I want to answer the following question: "If the 1st generation of cephalosporins were more often prescribed in the hospital compared to the hospital two?" etc.
Since some variable identifier may not be identical between hospitals, I expect this may return a NULL calculation.
Ideally, I would like to summarize all of these findings in a table with the corresponding p-value, to look like this:
id Hospital One Total Frequency Hospital Two Total Frequency p-value xyz 15 30 0.01
Many thanks for your help.
All data can be found below.
Greetings
EDIT the following points raised:
This is just mock output (ideally, what I would like).
id Hospital One Total Frequency Hospital Two Total Frequency p-value xyz ni x.xx
As already mentioned, the p value must be obtained from a chisq.test or fisher.test .
I'm going, the output must be somehow generated this way, with hospital no. 1 called hosp1 and no. 2 hospital called hosp2
# first take those columns of the dplyr output your interested in hosp1_sel<-hosp1[,c("id","total")] hosp2_sel<-hosp2[,c("id","total")]
This is where I am stuck. In my opinion, I would have to make this data.frame wider, then to run something like:
chisq.test(hosp1$Ureidopenicillin, hosp2$Ureidopenicillin)
To determine if there were more frequent cases of "ureidopenicillins" in Hospital No. 1 compared to Hospital No. 2, etc.
The problem is that this is actually a comparison of "counts", not the "proportions" from the contingency table, though ...
Any ideas?
ABOUT.
Hospital No. 1 data.frame :
structure(list(id = structure(1:19, .Label = c("1st gen Cephalosporin", "3rd gen Cephalosporin", "4th gen Cephalosporin", "Aminoglycoside", "Clindamycin", "Glycopeptide", "Macrolide", "Penicillin", "Tetracycline", "Trimethoprim", "Ureidopenicillin", "Carbapenem", "Fluorquinolone", "Nitromidazole", "Antifungal", "Oxazolidinone", "Rifamycin", "Polypeptide", "Lipopeptide "), class = "factor"), AB1 = c(4L, 44L, 3L, 1L, 1L, 7L, 1L, 7L, 2L, 1L, 12L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), AB2 = c(0L, 7L, 3L, 7L, 0L, 16L, 2L, 9L, 0L, 0L, 9L, 1L, 2L, 6L, 0L, 0L, 0L, 0L, 0L), AB3 = c(0L, 8L, 0L, 5L, 1L, 13L, 0L, 5L, 0L, 0L, 12L, 4L, 1L, 2L, 0L, 0L, 0L, 0L, 0L), AB4 = c(1L, 1L, 1L, 6L, 0L, 5L, 0L, 8L, 0L, 0L, 5L, 3L, 4L, 1L, 1L, 1L, 1L, 0L, 0L), AB5 = c(1L, 3L, 2L, 2L, 0L, 4L, 0L, 1L, 0L, 0L, 2L, 4L, 1L, 1L, 2L, 0L, 0L, 0L, 0L), AB6 = c(0L, 2L, 1L, 3L, 0L, 5L, 0L, 1L, 0L, 0L, 2L, 1L, 1L, 2L, 1L, 0L, 0L, 0L, 0L), AB7 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 3L, 0L, 0L, 2L, 2L, 2L, 0L, 0L, 1L, 0L, 1L, 0L), AB8 = c(0L, 0L, 0L, 3L, 0L, 1L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L), AB9 = c(0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L), AB10 = c(0L, 0L, 0L, 1L, 0L, 2L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), AB11 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 1L, 0L, 0L, 0L, 0L), AB12 = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L ), AB13 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), total = c(6, 65, 10, 31, 2, 55, 3, 36, 2, 2, 46, 19, 17, 12, 6, 2, 1, 1, 1), perc = c(1.9, 20.5, 3.2, 9.8, 0.6, 17.4, 0.9, 11.4, 0.6, 0.6, 14.5, 6, 5.4, 3.8, 1.9, 0.6, 0.3, 0.3, 0.3)), class = "data.frame", .Names = c("id", "AB1", "AB2", "AB3", "AB4", "AB5", "AB6", "AB7", "AB8", "AB9", "AB10", "AB11", "AB12", "AB13", "total", "perc"), row.names = c(NA, -19L))
Hospital No. 2 data.frame :
structure(list(id = structure(1:18, .Label = c("3rd gen Cephalosporin", "Carbapenem", "Fluoroquinolone", "Glycopeptide", "Penicillin", "Ureidopenicillin", "Lipopeptide", "Macrolid", "Aminoglycoside", "Polypeptide", "Rifamycin", "Tetracycline", "Lincosamide", "Quinolone", "Sulfonamides", "Nitroimidazole", "Polymyxine", "Colistin"), class = "factor"), AB1 = c(9L, 3L, 1L, 7L, 16L, 22L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), AB2 = c(2L, 17L, 5L, 8L, 2L, 9L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), AB3 = c(1L, 9L, 4L, 5L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), AB4 = c(1L, 3L, 3L, 7L, 4L, 3L, 0L, 0L, 2L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L), AB5 = c(3L, 1L, 4L, 1L, 4L, 1L, 2L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 2L, 0L, 0L, 0L), AB6 = c(3L, 2L, 4L, 1L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L), AB7 = c(0L, 2L, 3L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), total = c(19, 37, 24, 32, 29, 36, 4, 2, 9, 1, 1, 1, 1, 2, 2, 1, 1, 1), perc = c(9.4, 18.2, 11.8, 15.8, 14.3, 17.7, 2, 1, 4.4, 0.5, 0.5, 0.5, 0.5, 1, 1, 0.5, 0.5, 0.5)), class = "data.frame", .Names = c("id", "AB1", "AB2", "AB3", "AB4", "AB5", "AB6", "AB7", "total", "perc" ), row.names = c(NA, -18L))