How can I use functions that return vectors (like fivenum) with ddply or aggregate?

Question

How can I use functions that return vectors (like fivenum) with ddply or aggregate?

I would like to split my data frame with multiple columns and call let say fivenum for each group.

 aggregate(Petal.Width ~ Species, iris, function(x) summary(fivenum(x)))

The return value is data.frame with two columns, and the second is a matrix. How can I turn it into regular data.frame columns?

Update

I want to do something like the following with less code using fivenum

 ddply(iris, .(Species), summarise, Min = min(Petal.Width), Q1 = quantile(Petal.Width, .25), Med = median(Petal.Width), Q3 = quantile(Petal.Width, .75), Max = max(Petal.Width) )

+7

r aggregate plyr

mlt Feb 07 '13 at 18:42

source share

4 answers

Here is a solution using data.table (while not specifically requested, it is an obvious compliment or replacement for aggregate or ddply . Also, being a bit long for the code, calling quantile multiple times will be inefficient, since for each call you will sort the data

 library(data.table) Tukeys_five <- c("Min","Q1","Med","Q3","Max") IRIS <- data.table(iris) # this will create the wide data.table lengthBySpecies <- IRIS[,as.list(fivenum(Sepal.Length)), by = Species] # and you can rename the columns from V1, ..., V5 to something nicer setnames(lengthBySpecies, paste0('V',1:5), Tukeys_five) lengthBySpecies Species Min Q1 Med Q3 Max 1: setosa 4.3 4.8 5.0 5.2 5.8 2: versicolor 4.9 5.6 5.9 6.3 7.0 3: virginica 4.9 6.2 6.5 6.9 7.9

Or, using a single quantile call with the appropriate prob argument.

 IRIS[,as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25))), by = Species] Species 0% 25% 50% 75% 100% 1: setosa 4.3 4.800 5.0 5.2 5.8 2: versicolor 4.9 5.600 5.9 6.3 7.0 3: virginica 4.9 6.225 6.5 6.9 7.9

Note that the names of the generated columns are not syntactically valid, although you can go through a similar renaming with setnames

EDIT

Interestingly, quantile will set the names of the resulting vector if you set names = TRUE and this will copy (it will slow down the number of crunches and consumes memory - it even warns you in the help system, you like it!)

So you should probably use

  IRIS[,as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25), names = FALSE)), by = Species]

Or, if you want to return a named list, without R copying inside

 IRIS[,{quant <- as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25), names = FALSE)) setattr(quant, 'names', Tukeys_five) quant}, by = Species]

+9

mnel Feb 11 '13 at 1:46

source share

As far as I know, there is no exact way to do what you ask, because the function you use (fivenum) does not return data in a way that can be easily bound to columns from in the "ddply" function. This is easy to clean up, albeit programmatically.

Step 1 Execute the fivenum function for each View value using the ddply function.

 data <- ddply(iris, .(Species), summarize, value=fivenum(Petal.Width)) # Species value # 1 setosa 0.1 # 2 setosa 0.2 # 3 setosa 0.2 # 4 setosa 0.3 # 5 setosa 0.6 # 6 versicolor 1.0 # 7 versicolor 1.2 # 8 versicolor 1.3 # 9 versicolor 1.5 # 10 versicolor 1.8 # 11 virginica 1.4 # 12 virginica 1.8 # 13 virginica 2.0 # 14 virginica 2.3 # 15 virginica 2.5

Now the "fivenum" function returns a list, so we get 5 lines for each view. This is the part in which the "fivenum" acts.

Step 2 Add a label column. We know what five Tuki numbers are, so we simply call them in the order in which the fivenum function returns them. The list will be repeated until it reaches the end of the data.

 Tukeys_five <- c("Min","Q1","Med","Q3","Max") data$label <- Tukeys_five # Species value label # 1 setosa 0.1 Min # 2 setosa 0.2 Q1 # 3 setosa 0.2 Med # 4 setosa 0.3 Q3 # 5 setosa 0.6 Max # 6 versicolor 1.0 Min # 7 versicolor 1.2 Q1 # 8 versicolor 1.3 Med # 9 versicolor 1.5 Q3 # 10 versicolor 1.8 Max # 11 virginica 1.4 Min # 12 virginica 1.8 Q1 # 13 virginica 2.0 Med # 14 virginica 2.3 Q3 # 15 virginica 2.5 Max

Step 3 Using labels, we can quickly translate this data into a new form using the "dcast" function from the "reshape2" package.

 library(reshape2) dcast(data, Species ~ label)[,c("Species",Tukeys_five)] # Species Min Q1 Med Q3 Max # 1 setosa 0.1 0.2 0.2 0.3 0.6 # 2 versicolor 1.0 1.2 1.3 1.5 1.8 # 3 virginica 1.4 1.8 2.0 2.3 2.5

All this garbage at the end simply sets the order of the columns, since the dcast function automatically puts things in alphabetical order.

Hope this helps.

Refresh . I decided to return because I realized that there is another option available to you. You can always bind a matrix as part of a data frame definition so that you can solve your “aggregate” function as follows:

 data <- aggregate(Petal.Width ~ Species, iris, function(x) summary(fivenum(x))) result <- data.frame(Species=data[,1],data[,2]) # Species Min. X1st.Qu. Median Mean X3rd.Qu. Max. # 1 setosa 0.1 0.2 0.2 0.28 0.3 0.6 # 2 versicolor 1.0 1.2 1.3 1.36 1.5 1.8 # 3 virginica 1.4 1.8 2.0 2.00 2.3 2.5

+4

Dinre Feb 07 '13 at 19:22

source share

This is my decision:

 ddply(iris, .(Species), summarize, value=t(fivenum(Petal.Width)))

0

pmjn6 Oct 4 '15 at 10:55

source share

James · Accepted Answer · 2013-02-07T19:38:44+0000

You can use do.call to call data.frame on each of the matrix elements recursively to get data.frame with vector elements:

 dim(do.call("data.frame",dfr)) [1] 3 7 str(do.call("data.frame",dfr)) 'data.frame': 3 obs. of 7 variables: $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 2 3 $ Petal.Width.Min. : num 0.1 1 1.4 $ Petal.Width.1st.Qu.: num 0.2 1.2 1.8 $ Petal.Width.Median : num 0.2 1.3 2 $ Petal.Width.Mean : num 0.28 1.36 2 $ Petal.Width.3rd.Qu.: num 0.3 1.5 2.3 $ Petal.Width.Max. : num 0.6 1.8 2.5

How can I use functions that return vectors (like fivenum) with ddply or aggregate?

More articles: