R - counting the number of a certain value in cells

I have a data frame (df) as shown below:

Value <- c(1,1,0,2,1,3,4,0,0,1,2,0,3,0,4,5,2,3,0,6) Sl <- c(1:20) df <- data.frame(Sl,Value) > df Sl Value 1 1 1 2 2 1 3 3 0 4 4 2 5 5 1 6 6 3 7 7 4 8 8 0 9 9 0 10 10 1 11 11 2 12 12 0 13 13 3 14 14 0 15 15 4 16 16 5 17 17 2 18 18 3 19 19 0 20 20 6 

I would like to create 4 cells from df and count the occurrences of Value=0 , grouped by Sl values ​​in a separate data frame, as shown below:

 Bin Count 1 1 2 2 3 2 4 1 

I tried to use table and cut to create a desire data frame, but it is unclear how I will specify df$Value and logic to find 0 here

 df.4.cut <- as.data.frame(table(cut(df$Sl, breaks=seq(1,20, by=5)))) 
+5
source share
4 answers

Using df

 tapply(df$Value, cut(df$Sl, 4), function(x) sum(x==0)) 

gives

 > tapply(df$Value, cut(df$Sl, 4), function(x) sum(x==0)) (0.981,5.75] (5.75,10.5] (10.5,15.2] (15.2,20] 1 2 2 1 

In cut you can specify the number of gaps or gaps themselves, if you prefer, and the logic in the function definition in tapply

+3
source

Or using data.table , we convert 'data.frame' to 'data.table' ( setDT(df) ), using cut output as a grouping variable, we get sum of Value, which is '0' ( !Value ). Denying ( ! ), The column will be converted to the logical vector ie TRUE for 0 and FALSE all other values ​​that are not equal to 0.

 library(data.table) setDT(df)[,sum(!Value) , .(gr=cut(Sl,breaks=seq(0,20, 5)))] # gr V1 #1: (0,5] 1 #2: (5,10] 2 #3: (10,15] 2 #4: (15,20] 1 
+2
source

Your question uses table() , but it lacked a second argument. A contingency table needs to be created. You can find the account of each hopper with:

 table(cut(df$Sl,4),df$Value) 0 1 2 3 4 5 6 (0.981,5.75] 1 3 1 0 0 0 0 (5.75,10.5] 2 1 0 1 1 0 0 (10.5,15.2] 2 0 1 1 1 0 0 (15.2,20] 1 0 1 1 0 1 1 

And the number of Value == 0 for each bin:

 table(cut(df$Sl,4),df$Value)[,"0"] (0.981,5.75] (5.75,10.5] (10.5,15.2] (15.2,20] 1 2 2 1 
+2
source

A more complicated way using sqldf :

First, we create a table that defines the cells and ranges (min and max):

 bins <- data.frame(id = c(1, 2, 3, 4), bins = c("(0,5]", "(5,10]", "(10,15]", "(15,20]"), min = c(0, 6, 11, 16), max = c(5, 10, 15, 20)) id bins min max 1 1 (0,5] 0 5 2 2 (5,10] 6 10 3 3 (10,15] 11 15 4 4 (15,20] 16 20 

Then we use the following query, using both tables, so that each bin sl in its corresponding group uses BETWEEN for those Value that are 0.

 library(sqldf) sqldf("SELECT bins, COUNT(Value) AS freq FROM df, bins WHERE (((sl) BETWEEN [min] AND [max]) AND Value = 0) GROUP BY bins ORDER BY id") 

Output:

  bins freq 1 (0,5] 1 2 (5,10] 2 3 (10,15] 2 4 (15,20] 1 

Another alternative to simplifying the design of the bins proposed by mts using cut is to extract levels factor:

 bins <- data.frame(id = 1:4, bins = levels(cut(Sl, breaks = seq(0, 20, 5))), min = seq(1, 20, 5), max = seq(5, 20, 5)) 
+1
source

All Articles