Maybe this function is a little more straightforward? Or at least more compact.
bracketXtract <- function(txt, br = c("(", "[", "{", "all"), with=FALSE) { br <- match.arg(br) left <- # what pattern are we looking for on the left? if ("all" == br) "\\(|\\{|\\[" else sprintf("\\%s", br) map <- # what the corresponding pattern on the right? c(`\\(`="\\)", `\\[`="\\]", `\\{`="\\}", `\\(|\\{|\\[`="\\)|\\}|\\]") fmt <- # create the appropriate regular expression if (with) "(%s).*?(%s)" else "(?<=%s).*?(?=%s)" re <- sprintf(fmt, left, map[left]) regmatches(txt, gregexpr(re, txt, perl=TRUE)) # do it! }
No need lapply
; regular expression functions are vectorized in this way. This is not done with parentheses enclosed; likely regular expressions will not be a good solution if that is important. Here we are in action:
> txt <- c("I love chicken [unintelligible]!", + "Me too! (laughter) It so good.[interupting]", + "Yep it awesome {reading}.", + "Agreed.") > bracketXtract(txt, "all") [[1]] [1] "unintelligible" [[2]] [1] "laughter" "interupting" [[3]] [1] "reading" [[4]] character(0)
This is seamlessly related to data.frame
.
> examp2 <- data.frame(var1=1:4) > examp2$text <- c("I love chicken [unintelligible]!", + "Me too! (laughter) It so good.[interupting]", + "Yep it awesome {reading}.", "Agreed.") > examp2$text2<-bracketXtract(examp2$text, 'all') > examp2 var1 text text2 1 1 I love chicken [unintelligible]! unintelligible 2 2 Me too! (laughter) It so good.[interupting] laughter, interupting 3 3 Yep it awesome {reading}. reading 4 4 Agreed.
The warning you saw is related to an attempt to bind a matrix to a data frame. I think the answer is "don't do this."
> df = data.frame(x=1:2) > df$y = matrix(list(), 2, 2) > df xy 1 1 NULL 2 2 NULL Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs