Trying to convert the following R .frame file:
structure(list( Time=c("09:30:01" ,"09:30:29" ,"09:35:56", "09:37:17" ,"09:37:21" ,"09:37:28" ,"09:37:35" ,"09:37:51" ,"09:42:11" ,"10:00:31"), Price=c(1,2,3,4,5,6,7,8,9,10), Volume=c(100,200,300,100,200,300,100,200,600,100)), .Names = c("Time", "Price", "Volume"), row.names = c(NA,10L), class = "data.frame") Time Price Volume 1 09:30:01 1 100 2 09:30:29 2 200 3 09:35:56 3 300 4 09:37:17 4 100 5 09:37:21 5 200 6 09:37:28 6 300 7 09:37:35 7 100 8 09:37:51 8 200 9 09:42:11 9 600 10 10:00:31 10 100
in that
Time Price Volume Bin 1 09:30:01 1 100 1 2 09:30:29 2 200 1 3 09:35:56 3 200 1 4 09:35:56 3 100 2 5 09:37:17 4 100 2 6 09:37:21 5 200 2 7 09:37:28 6 100 2 8 09:37:28 6 200 3 9 09:37:35 7 100 3 10 09:37:51 8 200 3 11 09:42:11 9 500 4 12 09:42:11 9 100 5 13 10:00:31 10 100 5
Essentially, it calculates cumulative amounts by volume and binning each time 500 is violated. Thus, bit 1 is 100 + 200 + 200 with a volume at 09:35:56, split into 200/100, and a new line inserted, and Bean counter increased.
It is relatively simple with an R base, but I was wondering if there is a more elegant and hopefully faster way with dplyr.
Greetings
Update:
Thanks @Frank and @AntoniosK.
To solve your question, the range of volume values ββis integer positive values ββfrom 1 to 10k.
I microbenchmarked both approaches and dplyr was a little faster, but not so much, in a data set similar to the above, with ~ 200 thousand rows.
Really rate quick answers and help