R-programming: plyr how to read values ​​from a column with ddply

I would like to summarize the pass / fail status for my data as shown below. In other words, I would like to talk about the number of failures and failures for each product / type.

library(ggplot2) library(plyr) product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2") type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2") skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2") color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3") result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail") df = data.frame(product, type, skew, color, result) 

The following cmd returns the total number of cases with pass + fail error, but I need separate columns for pass and fail

 dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result)) 

Result:

  product type N 1 p1 t1 6 2 p1 t2 6 3 p2 t1 6 4 p2 t2 6 

The desired result would be

  product type Pass Fail 1 p1 t1 5 1 2 p1 t2 3 3 3 p2 t1 4 2 4 p2 t2 3 3 

I tried something like:

  dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) ) 

but obviously this is wrong, as the results are a big outcome for failure and passing.

Thanks in advance for your advice! Regards, Riad.

+7
r plyr
source share
2 answers

Try:

 dfSummary <- ddply(df, c("product", "type"), summarise, Pass=sum(result=="pass"), Fail=sum(result=="fail") ) 

Which gives me the result:

  product type Pass Fail 1 p1 t1 5 1 2 p1 t2 3 3 3 p2 t1 4 2 4 p2 t2 3 3 

Explanation:

  • You pass the df dataset to the ddply function.
  • ddply splits into variables, "product" and "type"
    • This results in length(unique(product)) * length(unique(type)) parts (i.e., subsets of df data) divided into each combination of two variables.
  • For each of the ddply parts, some function that you provide is applied. In this case, you count the number of result=="pass" and result=="fail" .
  • Now ddply left with some results for each part, namely with the variables that you separated (product and type) and the results you requested (Pass and Fail).
  • It combines all parts and returns them.
+11
source share

You can also use reshape2::dcast .

 library(reshape2) dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result') ## product type fail pass ## 1 p1 t1 1 5 ## 2 p1 t2 3 3 ## 3 p2 t1 2 4 ## 4 p2 t2 3 3 
+3
source share

All Articles