How to sum weighted data

Is it possible to use weights with dplyr :

 summarise 

function?

Let's imagine that I want to calculate a weighted table

 dta = structure(list(PHHWT14 = c(530, 457, 416, 497, 395, 480, 383, 420, 499, 424, 504, 497, 449, 406, 492, 470, 418, 407, 403, 362, 393, 368, 423, 448, 511, 511, 423, 470, 453, 429, 439, 425, 431, 443, 480, 452, 472, 406, 460, 436, 574, 456, 399, 476, 423, 501, 399, 459, 396, 409, 423, 399, 383, 433, 436, 413, 403, 414, 410, 337, 472, 448, 487, 442, 475, 410, 478, 483, 374, 414, 514, 422, 409, 455, 464, 362, 461, 356, 464, 456, 494, 348, 464, 432, 398, 426, 418, 429, 516, 363, 455, 413, 388, 508, 381, 439, 330, 385, 393, 454), SEX = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("Female", "Male"), class = "factor")), row.names = c(NA, 100L), class = "data.frame", .Names = c("PHHWT14", "SEX")) 

Using xtabs:

 xtabs(PHHWT14 ~ SEX, dta) 

I will get:

 SEX Female Male 10115 33490 

Is there a way to use summation with weights?

 dta %>% group_by(SEX) %>% summarise(n()) 
+1
source share
3 answers

You can also use summarise_each . For your example, this is the same as the summarise version, but if you have additional columns that you would like to generalize, this is very useful.

 dta %>% group_by(SEX) %>% summarise_each(funs(sum)) ## Source: local data frame [2 x 2] ## ## SEX PHHWT14 ## 1 Female 10115 ## 2 Male 33490 
+3
source
 dta %>% group_by(SEX) %>% summarise(sum(PHHWT14)) # SEX sum(PHHWT14) # 1 Female 10115 # 2 Male 33490 
+2
source

What you had in mind is grouping by variable, but you can also adjust the weight.

In general, if you have a variable of numerical weight or a validation coefficient, you can add additional arguments to the sum () function using a dot: Try this with iris df using dplyr:

 library(dplyr) set.seed(1234) df <- iris df[,"weights"] <- rnorm(nrow(df),1,0.1 ) # generate randomized weights head(df) df %>% group_by(Species) %>% summarise_each(funs(sum(. * weights , na.rm = TRUE), # Weighted Sum weighted.mean(.,w = weights, na.rm = TRUE))) -> agg.df # Weighted Mean agg.df 
+1
source

All Articles