Determining the number of non-zero cells and calculating prevalence using a stratifying variable

Question

Determining the number of non-zero cells and calculating prevalence using a stratifying variable

I spent a lot of time searching and cannot find a solution to my specific question. I would really appreciate any help.

I have a large data.frame (1258 out of 298 variables), where each row is a sample of participants, and each of the columns is a specific bacterial genus found in the sample. Then I have several entries for each member, which is also indicated in a column variable.

Here is an example of what a data frame might look like.

Corynebacterium <- c(0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.5, 0.7, 0.1, 0.0)
Paenibacillus <- c(0.0, 0.1, 0.7, 0.3, 0.5, 0.7, 0.0, 0.0, 0.0, 0.3, 0.3, 0.0)
Psychrobacter <- c(0.1, 0.1, 0.5, 0.0, 0.0, 0.0, 0.3, 0.6, 0.0, 0.6, 0.7, 0.0)
Staphylocccus <- c(0.5, 0.0, 0.3, 0.0, 0.3, 0.2, 0.5, 0.0, 0.4, 0.1, 0.1, 0.5)
TimePoint <- c("A", "B", "C", "D", "E", "F", "A", "B", "C", "D", "E", "F")
SampleDF <- data.frame(Corynebacterium, Paenibacillus, Psychrobacter, 
Staphylocccus, TimePoint)

I would like to know the number of nonzero cells from the total number of cells for a given time point.

: Corynebacterium TimePoint A # NonZeroCells/Total # Cells = 1/2 = 0.5. 50% Corynebacterium TimePoint A .

+4

r

EpiBlake 17 . '15 18:27

4

, . , - - aggregate() .

, aggregate , , , . , sum .

. .

func.simple_count <- function(data.vector) {

    return(sum(data.vector!=0))
}
aggregate(x = SampleDF[c("Corynebacterium","Paenibacillus","Psychrobacter","Staphylocccus")],
          by = list(SampleDF$TimePoint),
          FUN = func.simple_count)

:

  Group.1 Corynebacterium Paenibacillus Psychrobacter Staphylocccus
1       A               1             0             2             2
2       B               1             1             2             0
3       C               1             1             1             2
4       D               1             2             1             1
5       E               1             2             1             2
6       F               0             1             0             2

func.percent_nonzero <- function(data.vector) {

    return(sum(data.vector!=0)/length(data.vector))
}
aggregate(x = SampleDF[c("Corynebacterium","Paenibacillus","Psychrobacter","Staphylocccus")],
          by = list(SampleDF$TimePoint),
          FUN = func.percent_nonzero)

:

  Group.1 Corynebacterium Paenibacillus Psychrobacter Staphylocccus
1       A             0.5           0.0           1.0           1.0
2       B             0.5           0.5           1.0           0.0
3       C             0.5           0.5           0.5           1.0
4       D             0.5           1.0           0.5           0.5
5       E             0.5           1.0           0.5           1.0
6       F             0.0           0.5           0.0           1.0

, , aggregate, , names() !=, .

+3

TARehman 17 . '15 18:46

. ?table , xtabs CrossTable gmodels. .

library(reshape2)
df <- melt(SampleDF)
ftable(df)
#                           value 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
# TimePoint variable                                           
# A         Corynebacterium       1   0   0   0   0   1   0   0
#           Paenibacillus         2   0   0   0   0   0   0   0
#           Psychrobacter         0   1   0   1   0   0   0   0
#           Staphylocccus         0   0   0   0   0   2   0   0
# B         Corynebacterium       1   1   0   0   0   0   0   0
#           Paenibacillus         1   1   0   0   0   0   0   0
#           Psychrobacter         0   1   0   0   0   0   1   0
# ... more ...

0

JasonAizkalns 17 . '15 18:40

data.table

library(data.table)
setDT(SampleDF)[, lapply(.SD, function(x) sum(x!=0)/.N) , by= TimePoint]
#   TimePoint Corynebacterium Paenibacillus Psychrobacter Staphylocccus
#1:         A             0.5           0.0           1.0           1.0
#2:         B             0.5           0.5           1.0           0.0
#3:         C             0.5           0.5           0.5           1.0
#4:         D             0.5           1.0           0.5           0.5
#5:         E             0.5           1.0           0.5           1.0
#6:         F             0.0           0.5           0.0           1.0

0

akrun 18 . '15 2:32

Matthew Plourde · Accepted Answer · 2015-03-17T18:41:32+0000

dplyr :

SampleDF %>%
    group_by(TimePoint) %>%
    summarise_each(funs(sum(. != 0) / length(.)))

#   TimePoint Corynebacterium Paenibacillus Psychrobacter Staphylocccus
# 1         A             0.5           0.0           1.0           1.0
# 2         B             0.5           0.5           1.0           0.0
# 3         C             0.5           0.5           0.5           1.0
# 4         D             0.5           1.0           0.5           0.5
# 5         E             0.5           1.0           0.5           1.0
# 6         F             0.0           0.5           0.0           1.0

R:

aggregate(. ~ TimePoint, data=SampleDF, function(x) sum(x != 0) / length(x))

Determining the number of non-zero cells and calculating prevalence using a stratifying variable

More articles: