R is the equivalent of .first or .last sas

Does anyone know what is the best R alternative for SAS. or last. operators? I did not find anything.

SAS has FIRST. and LATEST. automatic variables that identify the first and last record among a group with the same value from a specific variable; therefore, in the following dataset, FIRST.model and LAST.model are defined:

Model,SaleID,First.Model,Last.Model Explorer,1,1,0 Explorer,2,0,0 Explorer,3,0,0 Explorer,4,0,1 Civic,5,1,0 Civic,6,0,0 Civic,7,0,1 
+6
source share
5 answers

It looks like you are looking for !duplicated , fromLast argument is FALSE or TRUE .

 d <- datasets::Puromycin d$state # [1] treated treated treated treated treated treated treated # [8] treated treated treated treated treated untreated untreated #[15] untreated untreated untreated untreated untreated untreated untreated #[22] untreated untreated #Levels: treated untreated !duplicated(d$state) # [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE #[13] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE !duplicated(d$state,fromLast=TRUE) # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE #[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 

There are some caveats and edge behavior for this function that you can find through the help files ( ?duplicated ).

+7
source

Head and tail function with the option n = 1 in combination with it is a good way. See R for SAS and SPss ** users (Robert Muenchen) Create a data frame using variables that are of interest ie for the latter.

 dfby<- data.frame(df$var1, df$var2) mylastList<-by(df,dfby,tail, n=1) #turn into a dataframe mylastDF<-do.call(rbind,mylastList) 
+4
source

Update (for first reading)

If you are really only interested in row indexes, it might be useful to use the direct use of split and range . The following assumes that the growth names in your dataset are sequentially numbered, but adaptation is likely to be possible as well.

 irisFirstLast <- sapply(split(iris, iris$Species), function(x) range(as.numeric(rownames(x)))) irisFirstLast ## Just the indices # setosa versicolor virginica # [1,] 1 51 101 # [2,] 50 100 150 iris[irisFirstLast[1, ], ] ## `1` would represent "first" # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 51 7.0 3.2 4.7 1.4 versicolor # 101 6.3 3.3 6.0 2.5 virginica iris[irisFirstLast, ] ## nothing would represent both first and last # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 50 5.0 3.3 1.4 0.2 setosa # 51 7.0 3.2 4.7 1.4 versicolor # 100 5.7 2.8 4.1 1.3 versicolor # 101 6.3 3.3 6.0 2.5 virginica # 150 5.9 3.0 5.1 1.8 virginica d <- datasets::Puromycin dFirstLast <- sapply(split(d, d$state), function(x) range(as.numeric(rownames(x)))) dFirstLast # treated untreated # [1,] 1 13 # [2,] 12 23 d[dFirstLast[2, ], ] ## `2` would represent `last` # conc rate state # 12 1.1 200 treated # 23 1.1 160 untreated 

If you work with named strings, the general approach is the same, but you must specify the range yourself. Here's a generic pattern:

 datasetFirstLast <- sapply(split(dataset, dataset$groupingvariable), function(x) c(rownames(x)[1], rownames(x)[length(rownames(x))])) 

Initial Answer (edited)

If you are interested in extracting rows and not in line number for other purposes, you can also examine data.table . Here are some examples:

 library(data.table) DT <- data.table(iris, key="Species") DT[J(unique(Species)), mult = "first"] # Species Sepal.Length Sepal.Width Petal.Length Petal.Width # 1: setosa 5.1 3.5 1.4 0.2 # 2: versicolor 7.0 3.2 4.7 1.4 # 3: virginica 6.3 3.3 6.0 2.5 DT[J(unique(Species)), mult = "last"] # Species Sepal.Length Sepal.Width Petal.Length Petal.Width # 1: setosa 5.0 3.3 1.4 0.2 # 2: versicolor 5.7 2.8 4.1 1.3 # 3: virginica 5.9 3.0 5.1 1.8 DT[, .SD[c(1,.N)], by=Species] # Species Sepal.Length Sepal.Width Petal.Length Petal.Width # 1: setosa 5.1 3.5 1.4 0.2 # 2: setosa 5.0 3.3 1.4 0.2 # 3: versicolor 7.0 3.2 4.7 1.4 # 4: versicolor 5.7 2.8 4.1 1.3 # 5: virginica 6.3 3.3 6.0 2.5 # 6: virginica 5.9 3.0 5.1 1.8 

This last approach is quite convenient. For example, if you need the first three lines and the last three lines of each group, you can use: DT[, .SD[c(1:3, (.N-2):.N)], by=Species] (Only for Help: .N represents the number of cases for each group.

Other useful approaches include:

 DT[, tail(.SD, 2), by = Species] ## last two rows of each group DT[, head(.SD, 4), by = Species] ## first four rows of each group 
+3
source

Here is the dplyr solution:

 # input dataset <- structure(list(Model = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L ), .Label = c("Civic", "Explorer"), class = "factor"), SaleID = 1:7), .Names = c("Model", "SaleID"), class = "data.frame", row.names = c(NA, -7L)) # code library(dplyr) dataset %>% group_by(Model) %>% mutate( "First" = row_number() == min( row_number() ), "Last" = row_number() == max( row_number() ) ) # output: Model SaleID First Last <fctr> <int> <lgl> <lgl> 1 Explorer 1 TRUE FALSE 2 Explorer 2 FALSE FALSE 3 Explorer 3 FALSE FALSE 4 Explorer 4 FALSE TRUE 5 Civic 5 TRUE FALSE 6 Civic 6 FALSE FALSE 7 Civic 7 FALSE TRUE 

PS: If you do not have dplyr running installed:

 install.packages("dplyr") 
+2
source

The function below is based on @Joe's description of the first / last.
The function returns a list of vectors.

Each list entry corresponds to columns of a data frame (i.e. functions or variables of a data set)
Then, within this list entry, there is an index that refers to the first (or last) element for each observation category.

EXAMPLE OF USE:

 # Pass in your data frame, and indicate whether or not you want to find Last or find First. # Assign to the appropriate variable first <- findFirstLast(myDF) last <- findFirstLast(myDF, findFirst=FALSE) 

data(iris) usage example data(iris)

 data(iris) first <- findFirstLast(iris) last <- findFirstLast(iris, findFirst=FALSE) 

what observation for each species:

  first$Species # setosa versicolor virginica # 1 51 101 last$Species # setosa versicolor virginica # 50 100 150 

Take the entire line for every first sepsia observation

 iris[first$Species, ] # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 51 7.0 3.2 4.7 1.4 versicolor # 101 6.3 3.3 6.0 2.5 virginica 




FUNCTION CODE findFirstLast ():

  findFirstLast <- function(myDF, findFirst=TRUE) { # myDF should be a data frame or matrix # By default, this function finds the first occurence of each unique value in a column # If instead we want to find last, set findFirst to FALSE. This will give `maxOrMin` a value of -1 # finding the min of the negative indecies is the same as finding the max of the positive indecies. maxOrMin <- ifelse(findFirst, 1, -1) # For each column in myDF, make a list of all unique values (`levs`) and iterate over that list, # finding the min (or max) of all the indicies of where that given value appears within the column apply(myDF, 2, function(colm) { levs <- unique(colm) sapply(levs, function(lev) { inds <- which(colm==lev) ifelse(length(inds)==0, NA, maxOrMin*min(inds*maxOrMin) ) }) }) } 
+1
source

All Articles