How to use dplyr to group elements by x, count x frequency for interval y?

Question

How to use dplyr to group elements by x, count x frequency for interval y?

x <- c ('a', 'v', 'c', 'a', 'd', 'e', 'g', 'f', 'h', 'y', 'u " , 'R', 's',' W ',' s', 'd', 'G', 'J', 'and', 'R', 's',' s', 's',' v ',' B ',' G ',' e ',' W ',' s ',' d ',' g ',' h ',' J ',' i ',' t ',' e " , 'W', 'W', 'Q', 'Q', 'd', 'v', 'B', 't', 't', 'k', 'L', 'and', ' p ',' o ',' R ',' t ',' N ',' e ',' W ',' W ',' J ',' e ',' s ',' G ',' h " , 't', 'R', 'd', 'e', 'W', 'W', 'W', 'Z', 'e', 'G', 'e', 'H', ' h ',' y ',' R ',' e ',' e ',' l ')
y <- sample (1:40, 79, replace = T)
y 1 38 18 19 19 37 38 26 4 32 23 11 24 36 15 22 19 6 24 13 36 2 26 35 39 8 33 20 19 23 28 5 17 40 26 18 21 [37] 35 23 27 12 3 33 16 32 11 19 4 5 8 19 5 19 33 33 33 13 12 32 21 4 14 8 28 34 33 22 34 19 39 23 6 8 [73] 37 17 21 16 38 15 36

enter image description here

I have two variables 'x' and 'y'. X has more than one instance of observation. There are values in y corresponding to each observation in 'x'

I would like to get grouping as well as splitting y values into intervals.

In other words, how many times a letter occurred, it was divided into intervals given on the basis of the value assigned to this letter in each of its events.

example: -

enter image description here

couldn't present the table correctly, because I could not find a better way to enter it here.

Hope this is clear. I will try to repeat it if necessary. I would appreciate any help in this regard.

+7

r dplyr

user3563667 Nov 01 '14 at 10:46

source share

2 answers

Following Ananda Mahto's suggestion, here is an implementation using by , cut and table .

 x = c('a','v','c','a','d','e','g','f','h','y','u','r','s','w','s','d','g','j', 'u','r','s','s','s','v','b','g','e','w','s','d','g','h','j','i','t','e', 'w','w','q','q','d','v','b','m','m','k','l','u','p','o','r','t','n','e', 'w','w','j','f','c','g','h','t','r','d','e','w','w','w','z','f','g','f', 'h','h','y','r','f','f','l') y = sample(1:40, 79, replace = TRUE) dfX = data.frame(x, y) t(sapply( by( dfX$y, list(dfX$x), cut, breaks = c(0, 10, 20, 30, 40)), table) )

Here is the result:

 > t(sapply(by(dfX$y, list(dfX$x), cut, breaks = c(0, 10, 20, 30, 40)), table)) (0,10] (10,20] (20,30] (30,40] a 0 0 0 2 b 0 0 2 0 c 0 1 0 1 d 0 2 2 1 e 2 1 1 1 f 0 4 1 1 g 3 0 1 2 h 2 0 2 1 i 0 0 0 1 j 1 2 0 0 k 1 0 0 0 l 0 1 1 0 m 0 1 0 1 n 0 0 0 1 o 0 1 0 0 p 1 0 0 0 q 0 1 1 0 r 2 1 0 2 s 0 2 0 4 t 1 1 0 1 u 1 0 1 1 v 2 0 0 1 w 6 0 3 0 y 0 1 0 1 z 1 0 0 0

+2

tchakravarty Nov 01 '14 at 11:09

source share

akrun · Accepted Answer · 2014-11-01T11:32:45+0000

Using dplyr

 library(dplyr) library(tidyr) res <- tally(group_by(df, x, y=cut(y, breaks=seq(0,40, by=10)))) %>% ungroup() %>% spread(y,n, fill=0)

Or using data.table

 library(data.table) res1 <- dcast.data.table(setDT(df)[,list(.N), by=list(x, y1=cut(y, breaks=seq(0,40, by=10)))], x~y1, value.var="N", fill=0L) all.equal(as.data.frame(res), as.data.frame(res1)) #[1] TRUE

Note. cut has a label argument, so if you want column headers to be freq0-10 , etc.

  tally(group_by(df, x, y=cut(y,breaks=seq(0,40, by=10), labels=paste0("freq", c("0-10", "10-20", "20-30", "30-40"))))) %>% ungroup() %>% spread(y,n, fill=0) %>% head(2) # x freq0-10 freq10-20 freq20-30 freq30-40 #1 a 0 1 1 0 #2 b 1 1 0 0

data

  df <- structure(list(x = structure(c(1L, 22L, 3L, 1L, 4L, 5L, 7L, 6L, 8L, 24L, 21L, 18L, 19L, 23L, 19L, 4L, 7L, 10L, 21L, 18L, 19L, 19L, 19L, 22L, 2L, 7L, 5L, 23L, 19L, 4L, 7L, 8L, 10L, 9L, 20L, 5L, 23L, 23L, 17L, 17L, 4L, 22L, 2L, 13L, 13L, 11L, 12L, 21L, 16L, 15L, 18L, 20L, 14L, 5L, 23L, 23L, 10L, 6L, 3L, 7L, 8L, 20L, 18L, 4L, 5L, 23L, 23L, 23L, 25L, 6L, 7L, 6L, 8L, 8L, 24L, 18L, 6L, 6L, 12L), .Label = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "y", "z"), class = "factor"), y = c(12L, 9L, 29L, 21L, 27L, 37L, 12L, 31L, 33L, 11L, 25L, 15L, 27L, 27L, 13L, 37L, 8L, 2L, 21L, 6L, 4L, 23L, 30L, 6L, 9L, 28L, 4L, 24L, 26L, 2L, 13L, 10L, 15L, 6L, 38L, 9L, 30L, 26L, 28L, 39L, 19L, 16L, 11L, 9L, 2L, 4L, 16L, 15L, 11L, 14L, 19L, 35L, 19L, 29L, 22L, 40L, 19L, 12L, 7L, 6L, 20L, 10L, 12L, 6L, 30L, 13L, 38L, 39L, 30L, 20L, 6L, 9L, 1L, 40L, 26L, 14L, 23L, 33L, 2L)), .Names = c("x", "y" ), row.names = c(NA, -79L), class = "data.frame")

How to use dplyr to group elements by x, count x frequency for interval y?

data

More articles: