Delete text inside brackets, parsers and / or curly braces

I need a function that extracts any types of brackets ie (), [], {} and the information between them. I created it and made it do what I want, but I get an annoying warning that I really don't know what that means. I want the annoying warning to disappear either by fixing the wrong code or suppressing the warning. I tried this with suppressWarnings (), but that didn't work because I don't think I used it correctly.

This feature uses regmatics and requires R version 2.14 or higher.

Here is the function below and an example for playing a warning. Thanks for the help.

################ # THE FUNCTION # ################ bracketXtract <- function(text, bracket = "all", include.bracket = TRUE) { bracketExtract <- if (include.bracket == FALSE) { function(Text, bracket) { switch(bracket, square = lapply(Text, function(j) gsub("[\\[\\]]", "", regmatches(j, gregexpr("\\[.*?\\]", j))[[1]], perl = TRUE)), round = lapply(Text, function(j) gsub("[\\(\\)]", "", regmatches(j, gregexpr("\\(.*?\\)", j))[[1]])), curly = lapply(Text, function(j) gsub("[\\{\\}]", "", regmatches(j, gregexpr("\\{.*?\\}", j))[[1]])), all = { P1 <- lapply(Text, function(j) gsub("[\\[\\]]", "", regmatches(j, gregexpr("\\[.*?\\]", j))[[1]], perl = TRUE)) P2 <- lapply(Text, function(j) gsub("[\\(\\)]", "", regmatches(j, gregexpr("\\(.*?\\)", j))[[1]])) P3 <- lapply(Text, function(j) gsub("[\\{\\}]", "", regmatches(j, gregexpr("\\{.*?\\}", j))[[1]])) apply(cbind(P1, P2, P3), 1, function(x) rbind(as.vector(unlist(x)))) }) } } else { function(Text, bracket) { switch(bracket, square = lapply(Text, function(j) regmatches(j, gregexpr("\\[.*?\\]", j))[[1]]), round = lapply(Text, function(j) regmatches(j, gregexpr("\\(.*?\\)", j))[[1]]), curly = lapply(Text, function(j) regmatches(j, gregexpr("\\{.*?\\}", j))[[1]]), all = { P1 <- lapply(Text, function(j) regmatches(j, gregexpr("\\[.*?\\]", j))[[1]]) P2 <- lapply(Text, function(j) regmatches(j, gregexpr("\\(.*?\\)", j))[[1]]) P3 <- lapply(Text, function(j) regmatches(j, gregexpr("\\{.*?\\}", j))[[1]]) apply(cbind(P1, P2, P3), 1, function(x) rbind(as.vector(unlist(x)))) }) } } if (length(text) == 1) { unlist(lapply(text, function(x) bracketExtract(Text = text, bracket = bracket))) } else { sapply(text, function(x) bracketExtract(Text = text, bracket = bracket)) } } ################## # TESTING IT OUT # ################## j <- "What kind of cheese isn't your cheese? {wonder} Nacho cheese! [groan] (Laugh)" bracketXtract(j, 'round') bracketXtract(j, 'round', include.bracket = FALSE) examp2<-data.frame(var1=1:4) examp2$text<-as.character(c("I love chicken [unintelligible]!", "Me too! (laughter) It so good.[interupting]", "Yep it awesome {reading}.", "Agreed.")) #=================================# # HERE"S WHERE THE WARNINGS COME: # #=================================# examp2$text2<-bracketXtract(examp2$text, 'round') examp2 examp2$text2<-bracketXtract(examp2$text, 'all') examp2 
+7
source share
4 answers

Maybe this function is a little more straightforward? Or at least more compact.

 bracketXtract <- function(txt, br = c("(", "[", "{", "all"), with=FALSE) { br <- match.arg(br) left <- # what pattern are we looking for on the left? if ("all" == br) "\\(|\\{|\\[" else sprintf("\\%s", br) map <- # what the corresponding pattern on the right? c(`\\(`="\\)", `\\[`="\\]", `\\{`="\\}", `\\(|\\{|\\[`="\\)|\\}|\\]") fmt <- # create the appropriate regular expression if (with) "(%s).*?(%s)" else "(?<=%s).*?(?=%s)" re <- sprintf(fmt, left, map[left]) regmatches(txt, gregexpr(re, txt, perl=TRUE)) # do it! } 

No need lapply ; regular expression functions are vectorized in this way. This is not done with parentheses enclosed; likely regular expressions will not be a good solution if that is important. Here we are in action:

 > txt <- c("I love chicken [unintelligible]!", + "Me too! (laughter) It so good.[interupting]", + "Yep it awesome {reading}.", + "Agreed.") > bracketXtract(txt, "all") [[1]] [1] "unintelligible" [[2]] [1] "laughter" "interupting" [[3]] [1] "reading" [[4]] character(0) 

This is seamlessly related to data.frame .

 > examp2 <- data.frame(var1=1:4) > examp2$text <- c("I love chicken [unintelligible]!", + "Me too! (laughter) It so good.[interupting]", + "Yep it awesome {reading}.", "Agreed.") > examp2$text2<-bracketXtract(examp2$text, 'all') > examp2 var1 text text2 1 1 I love chicken [unintelligible]! unintelligible 2 2 Me too! (laughter) It so good.[interupting] laughter, interupting 3 3 Yep it awesome {reading}. reading 4 4 Agreed. 

The warning you saw is related to an attempt to bind a matrix to a data frame. I think the answer is "don't do this."

 > df = data.frame(x=1:2) > df$y = matrix(list(), 2, 2) > df xy 1 1 NULL 2 2 NULL Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs 
+6
source

My thought was to make 6 (implicitly vectorized) helper functions, but instead I will study Martin's code, since it is much better than me:

 rm.curlybkt.no <-function(x) gsub("(\\{).*(\\})", "\\1\\2", x, perl=TRUE) rm.rndbkt.no <- function(x) gsub("(\\().*(\\))", "\\1\\2", x, perl=TRUE) rm.sqrbkt.no <- function(x) gsub("(\\[).*(\\])", "\\1\\2", x, perl=TRUE) rm.rndbkt.in <- function(x) gsub("\\(.*\\)", "", x) rm.curlybkt.in <- function(x) gsub("\\{.*\\}", "", x) rm.sqrbkt.in <- function(x) gsub("\\[.*\\]", "", x) 
+4
source

Suppose that the brackets are not nested and that we have this test data:

 x <- c("a (bb) [ccc]{d}e", "x[a]y") 

Then, using strapply in gsubfn, we have this two-line solution that first translates all parentheses and square brackets for parentheses, and then processes that:

 library(gsubfn) xx <- chartr("[]()", "{}{}", x) s <- strapply(xx, "{([^}]*)}", c) 

The result of the above is the following list:

 > s [[1]] [1] "bb" "ccc" "d" [[2]] [1] "a" 
+3
source

Take a picture. I prefer the stringr package! :)

 bracketXtract <- function(string, bracket = "all", include.bracket = TRUE){ # Load stringr package require(stringr) # Regular expressions for your brackets rgx = list(square = "\\[\\w*\\]", curly = "\\{\\w*\\}", round = "\\(\\w*\\)") rgx['all'] = sprintf('(%s)|(%s)|(%s)', rgx$square, rgx$curly, rgx$round) # Ensure you have the correct bracket name stopifnot(bracket %in% names(rgx)) # Find your matches matches = str_extract_all(string, pattern = rgx[[bracket]])[[1]] # Remove brackets from results if needed if(!include.bracket) matches = sapply(matches, function(m) substr(m, 2, nchar(m)-1)) unname(matches) } j <- "What kind of cheese isn't your cheese? {wonder} Nacho cheese! [groan] (Laugh)" bracketXtract(j) # [1] "{wonder}" "[groan]" "(Laugh)" bracketXtract(j, bracket = "square") # [1] "[groan]" bracketXtract(j, include.bracket = F) # [1] "wonder" "groan" "Laugh" 
0
source

All Articles