R-programming: plyr how to read values from a column with ddply

Question

R-programming: plyr how to read values from a column with ddply

I would like to summarize the pass / fail status for my data as shown below. In other words, I would like to talk about the number of failures and failures for each product / type.

library(ggplot2) library(plyr) product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2") type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2") skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2") color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3") result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail") df = data.frame(product, type, skew, color, result)

The following cmd returns the total number of cases with pass + fail error, but I need separate columns for pass and fail

 dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))

Result:

  product type N 1 p1 t1 6 2 p1 t2 6 3 p2 t1 6 4 p2 t2 6

The desired result would be

  product type Pass Fail 1 p1 t1 5 1 2 p1 t2 3 3 3 p2 t1 4 2 4 p2 t2 3 3

I tried something like:

  dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )

but obviously this is wrong, as the results are a big outcome for failure and passing.

Thanks in advance for your advice! Regards, Riad.

+7

r plyr

Riad Nov 20 '13 at 17:54

source share

2 answers

You can also use reshape2::dcast .

 library(reshape2) dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result') ## product type fail pass ## 1 p1 t1 1 5 ## 2 p1 t2 3 3 ## 3 p2 t1 2 4 ## 4 p2 t2 3 3

+3

mnel Nov 21 '13 at 0:51

source share

ialm · Accepted Answer · 2013-11-20T18:06:52+0000

Try:

 dfSummary <- ddply(df, c("product", "type"), summarise, Pass=sum(result=="pass"), Fail=sum(result=="fail") )

Which gives me the result:

  product type Pass Fail 1 p1 t1 5 1 2 p1 t2 3 3 3 p2 t1 4 2 4 p2 t2 3 3

Explanation:

You pass the df dataset to the ddply function.
ddply splits into variables, "product" and "type"
- This results in length(unique(product)) * length(unique(type)) parts (i.e., subsets of df data) divided into each combination of two variables.
For each of the ddply parts, some function that you provide is applied. In this case, you count the number of result=="pass" and result=="fail" .
Now ddply left with some results for each part, namely with the variables that you separated (product and type) and the results you requested (Pass and Fail).
It combines all parts and returns them.

R-programming: plyr how to read values ​​from a column with ddply

More articles:

R-programming: plyr how to read values from a column with ddply