How to make range grouping in a column using dplyr?

I want to group a data table based on the value of a range column, how can I do this using the dplyr library?

For example, my data table is as follows:

library(data.table) library(dplyr) DT <- data.table(A=1:100, B=runif(100), Amount=runif(100, 0, 100)) 

Now I want to group DT in 20 groups with an interval of 0.05 column B and calculate how many rows in each group. for example, any rows with a column B value in the range from [0, 0.05) form a group; any rows with a column B value in the range [0.05, 0.1) form another group, etc. Is there an efficient way to perform this group function?

Many thanks.

----------------------------- Another question about the answer is akrun. Thanks akrun for your answer. I have a new question about the "cut" function. If my DT looks like this:

 DT <- data.table(A=1:10, B=c(0.01, 0.04, 0.06, 0.09, 0.1, 0.13, 0.14, 0.15, 0.17, 0.71)) 

using the following code:

 DT %>% group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05), right=F) ) %>% summarise(n= n()) %>% arrange(as.numeric(gr)) 

I expect to see these results:

  gr n 1 [0,0.05) 2 2 [0.05,0.1) 2 3 [0.1,0.15) 3 4 [0.15,0.2) 2 5 [0.7,0.75) 1 

but the result I got is as follows:

  gr n 1 [0,0.05) 2 2 [0.05,0.1) 2 3 [0.1,0.15) 4 4 [0.15,0.2) 1 5 [0.7,0.75) 1 

It appears that a value of 0.15 is misallocated. Any thoughts on this?

+7
r dplyr grouping
source share
1 answer

We can use cut to group. We create a “gr” column in group_by , use summarise to create the number of elements in each group ( n() ), and arrange the output ( arrange ) based on “gr”.

 library(dplyr) DT %>% group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05)) ) %>% summarise(n= n()) %>% arrange(as.numeric(gr)) 

This can be done as the source data.table object using the data.table methods (the @Frank clause for using keyby )

 library(data.table) DT[,.N , keyby = .(gr=cut(B, breaks=seq(0, 1, by=0.05)))] 

EDIT:

Based on the update in the OP post, we could subtract a small number by seq

 lvls <- levels(cut(DT$B, seq(0, 1, by =0.05))) DT %>% group_by(gr=cut(B, breaks= seq(0, 1, by = 0.05) - .Machine$double.eps, right=FALSE, labels=lvls)) %>% summarise(n=n()) %>% arrange(as.numeric(gr)) # gr n #1 (0,0.05] 2 #2 (0.05,0.1] 2 #3 (0.1,0.15] 3 #4 (0.15,0.2] 2 #5 (0.7,0.75] 1 
+13
source share

All Articles