Dplyr: Combine multiple `count` +` mutate` statements for each variable into a single statement

Question

Dplyr: Combine multiple `count` +` mutate` statements for each variable into a single statement

Having data :

DT = structure(list(PE_RATIO = c(NA, 18.3468544431322, 21.8536295107188, NA, NA, NA), DIVIDEND_YIELD =c(NA, NA, 0.5283019, 1.06737822831035, NA, 0.55751900359546), DollarExposure = c(6765.12578958248, 95958.3106724681, 96328.1628155842, 291638.734002894, 170983.200676477, 185115.042371833)), .Names =c("PE_RATIO", "DIVIDEND_YIELD", "DollarExposure"), row.names = c(NA, -6L), class = c("data.table","data.frame"))
DT
#    PE_RATIO DIVIDEND_YIELD DollarExposure
# 1:       NA             NA       6765.126
# 2: 18.34685             NA      95958.311
# 3: 21.85363      0.5283019      96328.163
# 4:       NA      1.0673782     291638.734
# 5:       NA             NA     170983.201
# 6:       NA      0.5575190     185115.042

I would like to calculate a weighted fraction of the available values (called "Capture") for several variables (here PE_RATIOand DIVIDEND_YIELD). I can do this in separate statements, one statement for each variable:

DT %>% count(is.na(PE_RATIO), wt=abs(DollarExposure)) %>%
  mutate(PE_RATIO.Capture = prop.table(n))

# Source: local data table [2 x 3]
# 
# is.na(PE_RATIO)          n   PE_RATIO.Capture
# 1           FALSE 192286.5          0.2270773
# 2            TRUE 654502.1          0.7729227


DT %>% count(is.na(DIVIDEND_YIELD), wt=abs(DollarExposure)) %>%
  mutate(DIVIDEND_YIELD.Capture = prop.table(n))

# Source: local data table [2 x 3]
# 
# is.na(DIVIDEND_YIELD)          n   DIVIDEND_YIELD.Capture
# 1                 FALSE 573081.9                 0.676771
# 2                  TRUE 273706.6                 0.323229

Question

How to combine several operators and get a summary of variables in one expression dplyr? The desired result is as follows:

#         is.na(variable)  DIVIDEND_YIELD.Capture   PE_RATIO.Capture
# 1                 FALSE                0.676771          0.2270773
# 2                  TRUE                0.323229          0.7729227

There may be half a dozen variables for which the capture coefficient can be calculated.

+4

r dplyr

Daniel Krizian Oct 24 '14 at 9:45

source share

1 answer

konvas · Accepted Answer · 2014-10-24T10:01:02+0000

try something like this

library(tidyr)
library(dplyr)
DT %>% gather(variable, value, -DollarExposure) %>% 
    group_by(variable, isna = is.na(value)) %>% 
    summarise(total = sum(abs(DollarExposure))) %>%
    group_by(variable) %>%
    mutate(prop = prop.table(total)) %>%
    ungroup %>%
    select(-total) %>%
    spread(variable, prop) 
# Source: local data frame [2 x 3]
# 
#    isna  PE_RATIO DIVIDEND_YIELD
# 1 FALSE 0.2270773       0.676771
# 2  TRUE 0.7729227       0.323229

Dplyr: Combine multiple `count` +` mutate` statements for each variable into a single statement

More articles: