R is the equivalent of .first or .last sas

Question

R is the equivalent of .first or .last sas

Does anyone know what is the best R alternative for SAS. or last. operators? I did not find anything.

SAS has FIRST. and LATEST. automatic variables that identify the first and last record among a group with the same value from a specific variable; therefore, in the following dataset, FIRST.model and LAST.model are defined:

Model,SaleID,First.Model,Last.Model Explorer,1,1,0 Explorer,2,0,0 Explorer,3,0,0 Explorer,4,0,1 Civic,5,1,0 Civic,6,0,0 Civic,7,0,1

+6

r sas

Giorgio spedicato Dec 7 '12 at 15:03

source share

5 answers

Head and tail function with the option n = 1 in combination with it is a good way. See R for SAS and SPss ** users (Robert Muenchen) Create a data frame using variables that are of interest ie for the latter.

 dfby<- data.frame(df$var1, df$var2) mylastList<-by(df,dfby,tail, n=1) #turn into a dataframe mylastDF<-do.call(rbind,mylastList)

+4

Georgette asherman Dec 12 '12 at 13:05

source share

Update (for first reading)

If you are really only interested in row indexes, it might be useful to use the direct use of split and range . The following assumes that the growth names in your dataset are sequentially numbered, but adaptation is likely to be possible as well.

 irisFirstLast <- sapply(split(iris, iris$Species), function(x) range(as.numeric(rownames(x)))) irisFirstLast ## Just the indices # setosa versicolor virginica # [1,] 1 51 101 # [2,] 50 100 150 iris[irisFirstLast[1, ], ] ## `1` would represent "first" # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 51 7.0 3.2 4.7 1.4 versicolor # 101 6.3 3.3 6.0 2.5 virginica iris[irisFirstLast, ] ## nothing would represent both first and last # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 50 5.0 3.3 1.4 0.2 setosa # 51 7.0 3.2 4.7 1.4 versicolor # 100 5.7 2.8 4.1 1.3 versicolor # 101 6.3 3.3 6.0 2.5 virginica # 150 5.9 3.0 5.1 1.8 virginica d <- datasets::Puromycin dFirstLast <- sapply(split(d, d$state), function(x) range(as.numeric(rownames(x)))) dFirstLast # treated untreated # [1,] 1 13 # [2,] 12 23 d[dFirstLast[2, ], ] ## `2` would represent `last` # conc rate state # 12 1.1 200 treated # 23 1.1 160 untreated

If you work with named strings, the general approach is the same, but you must specify the range yourself. Here's a generic pattern:

 datasetFirstLast <- sapply(split(dataset, dataset$groupingvariable), function(x) c(rownames(x)[1], rownames(x)[length(rownames(x))]))

Initial Answer (edited)

If you are interested in extracting rows and not in line number for other purposes, you can also examine data.table . Here are some examples:

 library(data.table) DT <- data.table(iris, key="Species") DT[J(unique(Species)), mult = "first"] # Species Sepal.Length Sepal.Width Petal.Length Petal.Width # 1: setosa 5.1 3.5 1.4 0.2 # 2: versicolor 7.0 3.2 4.7 1.4 # 3: virginica 6.3 3.3 6.0 2.5 DT[J(unique(Species)), mult = "last"] # Species Sepal.Length Sepal.Width Petal.Length Petal.Width # 1: setosa 5.0 3.3 1.4 0.2 # 2: versicolor 5.7 2.8 4.1 1.3 # 3: virginica 5.9 3.0 5.1 1.8 DT[, .SD[c(1,.N)], by=Species] # Species Sepal.Length Sepal.Width Petal.Length Petal.Width # 1: setosa 5.1 3.5 1.4 0.2 # 2: setosa 5.0 3.3 1.4 0.2 # 3: versicolor 7.0 3.2 4.7 1.4 # 4: versicolor 5.7 2.8 4.1 1.3 # 5: virginica 6.3 3.3 6.0 2.5 # 6: virginica 5.9 3.0 5.1 1.8

This last approach is quite convenient. For example, if you need the first three lines and the last three lines of each group, you can use: DT[, .SD[c(1:3, (.N-2):.N)], by=Species] (Only for Help: .N represents the number of cases for each group.

Other useful approaches include:

 DT[, tail(.SD, 2), by = Species] ## last two rows of each group DT[, head(.SD, 4), by = Species] ## first four rows of each group

+3

A5C1D2H2I1M1N2O1R2T1 Dec 9 '12 at 12:05

source share

Here is the dplyr solution:

 # input dataset <- structure(list(Model = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L ), .Label = c("Civic", "Explorer"), class = "factor"), SaleID = 1:7), .Names = c("Model", "SaleID"), class = "data.frame", row.names = c(NA, -7L)) # code library(dplyr) dataset %>% group_by(Model) %>% mutate( "First" = row_number() == min( row_number() ), "Last" = row_number() == max( row_number() ) ) # output: Model SaleID First Last <fctr> <int> <lgl> <lgl> 1 Explorer 1 TRUE FALSE 2 Explorer 2 FALSE FALSE 3 Explorer 3 FALSE FALSE 4 Explorer 4 FALSE TRUE 5 Civic 5 TRUE FALSE 6 Civic 6 FALSE FALSE 7 Civic 7 FALSE TRUE

PS: If you do not have dplyr running installed:

 install.packages("dplyr")

+2

Rasmus larsen Jun 12 '17 at 13:19

source share

The function below is based on @Joe's description of the first / last.
The function returns a list of vectors.

Each list entry corresponds to columns of a data frame (i.e. functions or variables of a data set)
Then, within this list entry, there is an index that refers to the first (or last) element for each observation category.

EXAMPLE OF USE:

 # Pass in your data frame, and indicate whether or not you want to find Last or find First. # Assign to the appropriate variable first <- findFirstLast(myDF) last <- findFirstLast(myDF, findFirst=FALSE)

`data(iris)` usage example `data(iris)`

 data(iris) first <- findFirstLast(iris) last <- findFirstLast(iris, findFirst=FALSE)

what observation for each species:

  first$Species # setosa versicolor virginica # 1 51 101 last$Species # setosa versicolor virginica # 50 100 150

Take the entire line for every first sepsia observation

 iris[first$Species, ] # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 51 7.0 3.2 4.7 1.4 versicolor # 101 6.3 3.3 6.0 2.5 virginica

FUNCTION CODE findFirstLast ():

  findFirstLast <- function(myDF, findFirst=TRUE) { # myDF should be a data frame or matrix # By default, this function finds the first occurence of each unique value in a column # If instead we want to find last, set findFirst to FALSE. This will give `maxOrMin` a value of -1 # finding the min of the negative indecies is the same as finding the max of the positive indecies. maxOrMin <- ifelse(findFirst, 1, -1) # For each column in myDF, make a list of all unique values (`levs`) and iterate over that list, # finding the min (or max) of all the indicies of where that given value appears within the column apply(myDF, 2, function(colm) { levs <- unique(colm) sapply(levs, function(lev) { inds <- which(colm==lev) ifelse(length(inds)==0, NA, maxOrMin*min(inds*maxOrMin) ) }) }) }

+1

Ricardo saporta Dec 7 '12 at 21:28

source share

Blue magister · Accepted Answer · 2012-12-07T21:44:51+0000

It looks like you are looking for !duplicated , fromLast argument is FALSE or TRUE .

 d <- datasets::Puromycin d$state # [1] treated treated treated treated treated treated treated # [8] treated treated treated treated treated untreated untreated #[15] untreated untreated untreated untreated untreated untreated untreated #[22] untreated untreated #Levels: treated untreated !duplicated(d$state) # [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE #[13] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE !duplicated(d$state,fromLast=TRUE) # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE #[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE

There are some caveats and edge behavior for this function that you can find through the help files ( ?duplicated ).

R is the equivalent of .first or .last sas

Update (for first reading)

Initial Answer (edited)

EXAMPLE OF USE:

data(iris) usage example data(iris)

what observation for each species:

Take the entire line for every first sepsia observation

FUNCTION CODE findFirstLast ():

More articles:

`data(iris)` usage example `data(iris)`