Substitution of the data frame with the top n rows for each group and arranging the variable

I would like to multiply a data frame for n rows that are grouped by a variable and sorted in descending order by another variable. This will be seen in the example:

d1 <- data.frame(Gender = c("M", "M", "F", "F", "M", "M", "F", "F"), Age = c(15, 38, 17, 35, 26, 24, 20, 26)) 

I would like to get 2 rows that are sorted in descending order by Age, for each field. Desired Result:

 Gender Age F 35 F 26 M 38 M 26 

I was looking for order, sorting and other solutions here, but could not find a suitable solution to this problem. I appreciate your help.

+7
source share
5 answers

One solution using ddply() from plyr

 require(plyr) ddply(d1, "Gender", function(x) head(x[order(x$Age, decreasing = TRUE) , ], 2)) 
+13
source

With data.table package

 require(data.table) dt1<-data.table(d1)# to speedup you can add setkey(dt1,Gender) dt1[,.SD[order(Age,decreasing=TRUE)[1:2]],by=Gender] 
+5
source

I am sure there is a better answer, but here is one way:

 require(plyr) ddply(d1, c("Gender", "-Age"))[c(1:2, 5:6),-1] 

If you have a larger data frame than the one you specified here and donโ€™t want to visually check which rows to select, just use this:

 new.d1=ddply(d1, c("Gender", "-Age"))[,-1] pos=match('M',new.d1$Gender) # pos wil show index of first entry of M new.d1[c(1:2,pos:(pos+1)),] 
+1
source

This is even easier if you just want to sort:

 d1 <- transform(d1[order(d1$Age, decreasing=TRUE), ], Gender=as.factor(Gender)) 

you may call:

 require(plyr) d1 <- ddply(d1, .(Gender), head, n=2) 

to a subset of the first two of each subgroup by gender.

0
source

I have a suggestion if you need, for example, the first two women and the first 3 men:

 library(plyr) m<-d1[order(d1$Age, decreasing = TRUE) , ] h<-mapply(function(x,y) head(x,y), split(m$Age,m$Gender),y=c(2,3)) ldply (h, data.frame) 

You just need to change the names of the final data frame.

0
source

All Articles