What is the path R to execute the next group?

I have some data set, for example:

# date # value class 1984-04-01 95.32384 A 1984-04-01 39.86818 B 1984-07-01 43.57983 A 1984-07-01 10.83754 B 

Now I would like to group the data by the data and subtract the value of class B from class A. I looked at ddply, summarized, melted and filled, but I can not get what I want. Is there any way to make this easy? Note that I have exactly two values ​​for the date, one of classes A and one of class B. I mean, I could reset it to two dfs by sorting them by date and class and combine them again, but I I feel that there is a more R-way to do this.

+8
r group-by
source share
4 answers

The easiest way is to use the dcast from the dcast package, create a data frame with one date in a row and columns A and B , and then use transform for do AB :

 df <- data.frame( date = rep(seq(as.Date('1984-04-01'), as.Date('1984-04-01') + 3, by=1), 1, each=2), class = rep(c('A','B'), 4), value = sample(1:8)) require(reshape2) df_wide <- dcast(df, date ~ class, value_var = 'value') > df_wide date AB 1 1984-04-01 8 7 2 1984-04-02 6 1 3 1984-04-03 3 4 4 1984-04-04 5 2 > transform( df_wide, A_B = A - B ) date AB A_B 1 1984-04-01 8 7 1 2 1984-04-02 6 1 5 3 1984-04-03 3 4 -1 4 1984-04-04 5 2 3 
+6
source share

Assuming this data frame (generated as in the Prasad record, but with set.seed for reproducibility):

 set.seed(123) DF <- data.frame( date = rep(seq(as.Date('1984-04-01'), as.Date('1984-04-01') + 3, by=1), 1, each=2), class = rep(c('A','B'), 4), value = sample(1:8)) 

then we will look at seven solutions:

1) zoo can give us a one-line solution (not counting the library operator):

 library(zoo) z <- with(read.zoo(DF, split = 2), A - B) 

providing this zoo series:

 > z 1984-04-01 1984-04-02 1984-04-03 1984-04-04 -3 3 3 -5 

Also note that as.data.frame(z) or data.frame(time = time(z), value = coredata(z)) provides a data frame; however, you can leave it as an object of the zoo, as this is a time series, and other operations are more convenient for him in this form, for example. plot(z)

2) sqldf can also give one solution (except for calling library ):

 > library(sqldf) > sqldf("select date, sum(((class = 'A') - (class = 'B')) * value) as value + from DF group by date") date value 1 1984-04-01 -3 2 1984-04-02 3 3 1984-04-03 3 4 1984-04-04 -5 

3) tapply can be used as the basis for a solution based on sqldf solution:

 > with(DF, tapply(((class =="A") - (class == "B")) * value, date, sum)) 1984-04-01 1984-04-02 1984-04-03 1984-04-04 -3 3 3 -5 

4) the aggregate can be used in the same way as sqldf and tapply above (although a slightly different solution has already appeared, based also on aggregate ):

 > aggregate(((DF$class=="A") - (DF$class=="B")) * DF["value"], DF["date"], sum) date value 1 1984-04-01 -3 2 1984-04-02 3 3 1984-04-03 3 4 1984-04-04 -5 

5) summaryBy from the doBy package can provide another solution, although it needs to transform to help it:

 > library(doBy) > summaryBy(value ~ date, transform(DF, value = ((class == "A") - (class == "B")) * value), FUN = sum, keep.names = TRUE) date value 1 1984-04-01 -3 2 1984-04-02 3 3 1984-04-03 3 4 1984-04-04 -5 

6) a remix from a remix package can do this, but with transform especially nice conclusion:

 > library(remix) > remix(value ~ date, transform(DF, value = ((class == "A") - (class == "B")) * value), sum) value ~ date ============ +------+------------+-------+-----+ | | sum | +======+============+=======+=====+ | date | 1984-04-01 | value | -3 | + +------------+-------+-----+ | | 1984-04-02 | value | 3 | + +------------+-------+-----+ | | 1984-04-03 | value | 3 | + +------------+-------+-----+ | | 1984-04-04 | value | -5 | +------+------------+-------+-----+ 

7) summary.formula in the Hmisc package also has an excellent result:

 > library(Hmisc) > summary(value ~ date, data = transform(DF, value = ((class == "A") - (class == "B")) * value), fun = sum, overall = FALSE) value N=8 +----+----------+-+-----+ | | |N|value| +----+----------+-+-----+ |date|1984-04-01|2|-3 | | |1984-04-02|2| 3 | | |1984-04-03|2| 3 | | |1984-04-04|2|-5 | +----+----------+-+-----+ 
+7
source share

In the R base, I would approach the problem using aggregate and sum . This works by converting each value of class B to its negative:

(Using data provided by @PrasadChalasani)

 df <- within(df, value[class=="B"] <- -value[class=="B"]) aggregate(df$value, by=list(date=df$date), sum) date x 1 1984-04-01 3 2 1984-04-02 2 3 1984-04-03 2 4 1984-04-04 1 
+5
source share

For the record, I like the option of changing the form. The plyr option is used here:

 library(plyr) ddply(df, "date", summarise , A = value[class == "A"] , B = value[class == "B"] , A_B = value[class == "A"] - value[class == "B"] ) 
+4
source share

All Articles