R - which and which

Question

R - which and which

I have a simple question: how could I use which and which.max at the same time.

I would like to select the maximum epnum for the id == B13639J2 . I need to restore the row number because I need to make some changes manually to the variable.

So max epnum line id == 'B13639J2'

  id epnum start 95528 B13639J2 1 0 95529 B13639J2 2 860 95530 B13639J2 3 1110 95531 B13639J2 4 1155 95532 B13639J2 5 1440

I was wondering how can I just do something like

 dta[which(dta$id == 'B13639J2' & which.max(dta$epnum)), ]

Finally, I need to remove the spotted line.

Thanks.

Data

 dta = structure(list(id = c("B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J2", "B13639J2", "B13639J2", "B13639J2", "B13639J2"), epnum = c(4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5), start = c(420, 425, 435, 540, 570, 1000, 1310, 1325, 0, 860, 1110, 1155, 1440)), .Names = c("id", "epnum", "start"), row.names = 95520:95532, class = "data.frame")

+4

r

giacomo Jul 28 '15 at 21:59

source share

3 answers

A workaround is the R way of doing this. Temporarily set the copy of all epnum values not in your desired group to NA , then run which.max and which.max resulting line:

 dta[-which.max(replace(dta$epnum, dta$id != "B13639J2", NA)),] # id epnum start #95520 B13639J1 4 420 #95521 B13639J1 5 425 #95522 B13639J1 6 435 #95523 B13639J1 7 540 #95524 B13639J1 8 570 #95525 B13639J1 9 1000 #95526 B13639J1 10 1310 #95527 B13639J1 11 1325 #95528 B13639J2 1 0 #95529 B13639J2 2 860 #95530 B13639J2 3 1110 #95531 B13639J2 4 1155

This is because which.max automatically skips all NA or NaN :

 which.max(c(NA,1,NaN,2,3)) #[1] 5

This does not change the order of the rows in the data set or does not delete any rownames information and works pretty quickly (about 3 seconds to process a 10M row file).

+2

thelatemail Jul 28 '15 at 23:41

source share

Let me move on to another possible solution. Let me know what you think.

First I create for each variable max of epnum

 dta = dta %>% group_by(id) %>% mutate(max = n())

Then just me ! conditions

 dta[ !(dta$id == 'B13639J2' & (dta$epnum == dta$max)) , ]

0

giacomo Jul 29 '15 at 10:14

source share

akrun · Accepted Answer · 2015-07-28T22:05:00+0000

One parameter, if we use a numeric index ( which / which.max ), will be slice from dplyr . Double slice needed here. First we multiply the 'id' ie 'B13639J2', and then again the subset for the max 'epnum' value.

  library(dplyr) slice(dta, which(id=='B13639J2')) %>% slice(which.max(epnum)) # id epnum start #1 B13639J2 5 1440

Or we group by "id", arrange "epnum" in descending order, and filter - the first line with the specified "id".

  dta1 <- dta %>% group_by(id) %>% arrange(desc(epnum)) %>% filter(id=='B13639J2', row_number()==1L)

If we want to remove this row from the dataset, one parameter is anti_join with the original dataset.

  anti_join(dta, dta1)

Or by changing the filter parameter, this can be done

  dta %>% group_by(id) %>% arrange(desc(epnum)) %>% filter(!(id=='B13639J2' & row_number()==1L))

R - which and which

More articles: