R - which and which

I have a simple question: how could I use which and which.max at the same time.

I would like to select the maximum epnum for the id == B13639J2 . I need to restore the row number because I need to make some changes manually to the variable.

So max epnum line id == 'B13639J2'

  id epnum start 95528 B13639J2 1 0 95529 B13639J2 2 860 95530 B13639J2 3 1110 95531 B13639J2 4 1155 95532 B13639J2 5 1440 

I was wondering how can I just do something like

 dta[which(dta$id == 'B13639J2' & which.max(dta$epnum)), ] 

Finally, I need to remove the spotted line.

Thanks.

Data

 dta = structure(list(id = c("B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J2", "B13639J2", "B13639J2", "B13639J2", "B13639J2"), epnum = c(4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5), start = c(420, 425, 435, 540, 570, 1000, 1310, 1325, 0, 860, 1110, 1155, 1440)), .Names = c("id", "epnum", "start"), row.names = 95520:95532, class = "data.frame") 
+4
source share
3 answers

One parameter, if we use a numeric index ( which / which.max ), will be slice from dplyr . Double slice needed here. First we multiply the 'id' ie 'B13639J2', and then again the subset for the max 'epnum' value.

  library(dplyr) slice(dta, which(id=='B13639J2')) %>% slice(which.max(epnum)) # id epnum start #1 B13639J2 5 1440 

Or we group by "id", arrange "epnum" in descending order, and filter - the first line with the specified "id".

  dta1 <- dta %>% group_by(id) %>% arrange(desc(epnum)) %>% filter(id=='B13639J2', row_number()==1L) 

If we want to remove this row from the dataset, one parameter is anti_join with the original dataset.

  anti_join(dta, dta1) 

Or by changing the filter parameter, this can be done

  dta %>% group_by(id) %>% arrange(desc(epnum)) %>% filter(!(id=='B13639J2' & row_number()==1L)) 
+8
source

A workaround is the R way of doing this. Temporarily set the copy of all epnum values ​​not in your desired group to NA , then run which.max and which.max resulting line:

 dta[-which.max(replace(dta$epnum, dta$id != "B13639J2", NA)),] # id epnum start #95520 B13639J1 4 420 #95521 B13639J1 5 425 #95522 B13639J1 6 435 #95523 B13639J1 7 540 #95524 B13639J1 8 570 #95525 B13639J1 9 1000 #95526 B13639J1 10 1310 #95527 B13639J1 11 1325 #95528 B13639J2 1 0 #95529 B13639J2 2 860 #95530 B13639J2 3 1110 #95531 B13639J2 4 1155 

This is because which.max automatically skips all NA or NaN :

 which.max(c(NA,1,NaN,2,3)) #[1] 5 

This does not change the order of the rows in the data set or does not delete any rownames information and works pretty quickly (about 3 seconds to process a 10M row file).

+2
source

Let me move on to another possible solution. Let me know what you think.

First I create for each variable max of epnum

 dta = dta %>% group_by(id) %>% mutate(max = n()) 

Then just me ! conditions

 dta[ !(dta$id == 'B13639J2' & (dta$epnum == dta$max)) , ] 
0
source

All Articles