Conditional aggregate amount in R

I have a time series data frame and want to calculate cumulative returns for intraday stock symbols for a date range. When the symbol and / or date changes, the cumulative income must reset. Any help would be greatly appreciated. A small sample of my data frame below includes what the total column should return. Thank you

Date Symbol Time Last Return Cumulative.Sum 1 1/2/2013 AA 9:30 42.00 n/an/a 2 1/2/2013 AA 12:00 42.50 1.19% 1.19% 3 1/2/2013 AA 16:00 42.88 0.89% 2.08% 4 1/2/2013 AAPL 9:30 387.00 n/an/a 5 1/2/2013 AAPL 12:00 387.87 0.22% 0.22% 6 1/2/2013 AAPL 16:00 388.69 0.21% 0.44% 7 1/3/2013 AA 9:30 42.88 n/an/a 8 1/3/2013 AA 12:00 42.11 -1.80% -1.80% 9 1/3/2013 AA 16:00 41.89 -0.52% -2.32% 
+7
source share
3 answers

using the data.table package, this is trivial. If your data is in a data.frame called dat :

 library(data.table) DT <- data.table(dat) DT[, your_cumsum_function(.SD), by=c('Date', 'Symbol')] 

Where .SD is a subset of data.table defined by by groups. See ?data.table more details.

You can also pass column names directly:

 DT[, your_cumsum_function(Last), by=c('Date', 'Symbol')] 

In your specific example, do:

 DT[, Return := as.numeric(sub('%$', '', Return))] DT[!is.na(Return), Cumulative.Sum := cumsum(Return), by = c('Date', 'Symbol')] 
+11
source

This is a typical example of a split-apply-comb strategy: you split your data.frame into unique combinations of specific columns (Date and Symbol), apply the cumsum on Return procedure, and merge the subsets back to the large data.frame . This can be easily done using ddply from the plyr package:

 mdf$Return <- as.numeric(sub( "(\\d+\\.\\d+)\\%", "\\1", mdf$Return )) mdf$Return[ is.na(mdf$Return) ] <- 0 library(plyr) ddply(mdf, .(Date,Symbol), transform, Cumulative.Sum = cumsum(Return)) Date Symbol Time Last Return Cumulative.Sum 1 1/2/2013 AA 9:30 42.00 0.00 0.00 2 1/2/2013 AA 12:00 42.50 1.19 1.19 3 1/2/2013 AA 16:00 42.88 0.89 2.08 4 1/2/2013 AAPL 9:30 387.00 0.00 0.00 5 1/2/2013 AAPL 12:00 387.87 0.22 0.22 6 1/2/2013 AAPL 16:00 388.69 0.21 0.43 7 1/3/2013 AA 9:30 42.88 0.00 0.00 8 1/3/2013 AA 12:00 42.11 -1.80 -1.80 9 1/3/2013 AA 16:00 41.89 -0.52 -2.32 
+10
source

Sample data (note: I used the lubridate library only for date processing)

 library(lubridate) df <- data.frame( Date = dmy( c( "1/2/2013", "1/2/2013", "1/2/2013", "1/2/2013" , "1/2/2013", "1/2/2013", "1/3/2013", "1/3/2013", "1/3/2013" ) ), Symbol = c( "AA", "AA", "AA", "AAPL", "AAPL", "AAPL", "AA", "AA", "AA" ), Return = c( NA, 1.19, 0.89, NA, 0.22, 0.21, NA, -1.80, -0.52 ) ) 

Now, using dplyr , you can group_by your group_by and create the desired column:

 library(dplyr) df %>% group_by(Date, Symbol) %>% mutate( Return_aux = ifelse( is.na(Return), 0, Return ), Cum_Sum = cumsum(Return_aux) ) 

Hooray!

0
source

All Articles