Use lapply for a subset of list items and a return list of the same length as the original in R

Question

Use lapply for a subset of list items and a return list of the same length as the original in R

I want to apply a regex operation to a subset of list items (which are character strings) using lapply, and returns a list of the same length as the original. List items are long lines (obtained from reading in long text files and folding paragraphs into one line). The regex operation is only valid for a subset of list items / lines. I want the elements of a subset of lists (character strings) to be returned in their original state.

The regular expression operation str_extractfrom the package stringr, i.e. I want to extract a substring from a longer string. I multiply list items based on a regular expression pattern in the file name.

Simplified data example:

library(stringr)
texts <- as.list(c("abcdefghijkl", "mnopqrstuvwxyz", "ghijklmnopqrs", "uvwxyzabcdef"))
filenames <- c("AB1997R.txt", "BG2000S.txt", "MN1999R.txt", "DC1997S.txt")
names(texts) <- filenames
regexp <- "abcdef"

I know in advance which lines I want to apply the regular expression operation to, and therefore I want to multiply these lines. That is, I do not want to run a regular expression on all elements in the list, as this will lead to the return of some invalid results (which is not obvious in this simplified example).

I made a few naive efforts, for example:

x <- lapply(texts[str_detect(names(texts), "1997")], str_extract, regexp)
> x
$AB1997R.txt
[1] "abcdef"

$DC1997S.txt
[1] "abcdef"

which returns a shortened list containing only found substrings. But the results that I want to get are as follows:

> x
$AB1997R.txt
[1] "abcdef"

$BG2000S.txt
[1] "mnopqrstuvwxyz"

$MN1999R.txt
[1] "ghijklmnopqrs"

$DC1997S.txt
[1] "abcdef"

where strings that do not contain a regular expression pattern are returned in their original state.

stringr, lapply llply ( plyr), , , . , for, , , .

+4

regex r plyr lapply

Brigitte 31 '15 20:03

2

sub

  sub(paste0('.*(', regexp, ').*'), '\\1', texts)
  # AB1997R.txt      BG2000S.txt      MN1999R.txt      DC1997S.txt 
  #  "abcdef" "mnopqrstuvwxyz"  "ghijklmnopqrs"         "abcdef"

, "" 1997, grep

  indx <- grep('1997', names(texts))
  texts[indx] <- sub(paste0('.*(', regexp, ').*'), '\\1', texts[indx])
  as.list(texts)

+3

akrun 31 '15 20:09

sgibb · Accepted Answer · 2015-05-31T20:09:47+0000

[<-:

x <- texts
is1997 <- str_detect(names(texts), "1997")
x[is1997] <- lapply(texts[is1997], str_extract, regexp)
x
# $AB1997R.txt
# [1] "abcdef"
#
# $BG2000S.txt
# [1] "mnopqrstuvwxyz"
#
# $MN1999R.txt
# [1] "ghijklmnopqrs"
#
# $DC1997S.txt
# [1] "abcdef"
#

Use lapply for a subset of list items and a return list of the same length as the original in R

More articles: