Substitution of the data frame with the top n rows for each group and arranging the variable

Question

Substitution of the data frame with the top n rows for each group and arranging the variable

I would like to multiply a data frame for n rows that are grouped by a variable and sorted in descending order by another variable. This will be seen in the example:

d1 <- data.frame(Gender = c("M", "M", "F", "F", "M", "M", "F", "F"), Age = c(15, 38, 17, 35, 26, 24, 20, 26))

I would like to get 2 rows that are sorted in descending order by Age, for each field. Desired Result:

 Gender Age F 35 F 26 M 38 M 26

I was looking for order, sorting and other solutions here, but could not find a suitable solution to this problem. I appreciate your help.

+7

r group-by order data.table plyr

karlos May 20, '11 at 17:38

source share

5 answers

With data.table package

 require(data.table) dt1<-data.table(d1)# to speedup you can add setkey(dt1,Gender) dt1[,.SD[order(Age,decreasing=TRUE)[1:2]],by=Gender]

+5

Wojciech sobala May 20, '11 at 18:34

source share

I am sure there is a better answer, but here is one way:

 require(plyr) ddply(d1, c("Gender", "-Age"))[c(1:2, 5:6),-1]

If you have a larger data frame than the one you specified here and don’t want to visually check which rows to select, just use this:

 new.d1=ddply(d1, c("Gender", "-Age"))[,-1] pos=match('M',new.d1$Gender) # pos wil show index of first entry of M new.d1[c(1:2,pos:(pos+1)),]

+1

Manoel galdino May 20, '11 at 18:08

source share

This is even easier if you just want to sort:

 d1 <- transform(d1[order(d1$Age, decreasing=TRUE), ], Gender=as.factor(Gender))

you may call:

 require(plyr) d1 <- ddply(d1, .(Gender), head, n=2)

to a subset of the first two of each subgroup by gender.

0

alphaG77 25 sept. '11 at 16:56

source share

I have a suggestion if you need, for example, the first two women and the first 3 men:

 library(plyr) m<-d1[order(d1$Age, decreasing = TRUE) , ] h<-mapply(function(x,y) head(x,y), split(m$Age,m$Gender),y=c(2,3)) ldply (h, data.frame)

You just need to change the names of the final data frame.

0

Liliana pacheco Jan 05 '17 at 19:28

source share

Chase · Accepted Answer · 2011-05-20T18:05:30+0000

One solution using ddply() from plyr

 require(plyr) ddply(d1, "Gender", function(x) head(x[order(x$Age, decreasing = TRUE) , ], 2))

Substitution of the data frame with the top n rows for each group and arranging the variable

More articles: