I often have data where I want to compare the value of one level of a variable with all other levels of a variable. Every time I write code for this, I would like it to be easier. Here is an example of a problem:
Suppose I want to compare the average price of diamonds of any cut with the average cost of the best diamonds. For everything to be fair, I want to do this for each clarity separately.
Let's check that we have enough data:
> with(diamonds,table(cut,clarity)) clarity cut I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF Fair 210 466 408 261 170 69 17 9 Good 96 1081 1560 978 648 286 186 71 Very Good 84 2100 3240 2591 1775 1235 789 268 Premium 205 2949 3575 3357 1989 870 616 230 Ideal 146 2598 4282 5071 3589 2606 2047 1212
okay no zeroes in Idea, so let's calculate the average.
> claritycut<-ddply(diamonds,.(clarity,cut),summarize,price=mean(price)) > claritycut clarity cut price 1 I1 Fair 3703.533 2 I1 Good 3596.635 3 I1 Very Good 4078.226 4 I1 Premium 3947.332 5 I1 Ideal 4335.726 6 SI2 Fair 5173.916 7 SI2 Good 4580.261 8 SI2 Very Good 4988.688 9 SI2 Premium 5545.937 10 SI2 Ideal 4755.953 ...
Final result:
clarity variable ratio 1 I1 Fair 0.8541899 2 I1 Good 0.8295348 3 I1 Very Good 0.9406098 4 I1 Premium 0.9104200 5 I1 Ideal 1.0000000 6 SI2 Fair 1.0878822 7 SI2 Good 0.9630586 8 SI2 Very Good 1.0489356 9 SI2 Premium 1.1661043 10 SI2 Ideal 1.0000000 ...
But I'm not sure how to do it neatly. Most of the rest of this question relates to the intermediate step of computation — division.
Now I want to calculate the relative price of all contractions against Ideals. Here is a data frame that I expect to see in part through computation - retrieving only one section level:
> claritycutideal <- join(subset(claritycut,cut!="Ideal"),summarize(subset(claritycut,cut=="Ideal"),Ideal=price,clarity)) > print(claritycutideal) Joining by: clarity clarity cut price Ideal 1 I1 Fair 3703.533 4335.726 2 I1 Good 3596.635 4335.726 3 I1 Very Good 4078.226 4335.726 4 I1 Premium 3947.332 4335.726 5 SI2 Fair 5173.916 4755.953 6 SI2 Good 4580.261 4755.953 7 SI2 Very Good 4988.688 4755.953 8 SI2 Premium 5545.937 4755.953 ...
Which works, but it's hard to write the above statement, and I still need to finish the calculation by mentioning the name Ideal again.
> mutate(claritycutideal,ratio=price/Ideal)
It seems to me that I want something like
> cast(claritycut,clarity~cut) Using clarity, cut as id variables clarity Fair Good Very Good Premium Ideal 1 I1 3703.533 3596.635 4078.226 3947.332 4335.726 2 SI2 5173.916 4580.261 4988.688 5545.937 4755.953 3 SI1 4208.279 3689.533 3932.391 4455.269 3752.118 4 VS2 4174.724 4262.236 4215.760 4550.331 3284.550 ...
This is completely unsuitable for average calculation, since I need to know the names of all levels of allocation in the calculation:
I would like to redo it, but with the ability to filter selected levels and leave the rest untouched, for example:
> cast(claritycut,clarity~cut,subset=cut=="Ideal")
which exists but does not preserve unfiltered levels.
Then I needed to melt it again, and although there is a rework, there is no remelting.
Does anyone have a neat trick to do this?
Or maybe I'm looking at it completely wrong - do the ultimate calculations for me?
The following works correctly, but is inconvenient:
> valuevars=function(x)x[!names(x)%in%attr(x,"idvars")] > melt(ddply(cast(claritycut,clarity~cut),.(clarity), function(x)valuevars(x)/x$Ideal))