How to match a long matching string

I have a string and character vector. I would like to find all the lines in character vector matching as many characters as possible from the beginning of the line. For instance:

s <- "abs" vc <- c("ab","bb","abc","acbd","dert") result <- c("ab","abc") 

The string s must match exactly before the first characters of K. I want a match as large as possible (max K <= length (s)). There are no matches for "abs" (grep ("abs", vc)), but for "ab" there are two matches (result <-grep ("ab", vc)).

+4
source share
3 answers

Another interpretation:

 s <- "abs" # Updated vc vc <- c("ab","bb","abc","acbd","dert","abwabsabs") st <- strsplit(s, "")[[1]] mtc <- sapply(strsplit(substr(vc, 1, nchar(s)), ""), function(i) { m <- i == st[1:length(i)] sum(m * cumsum(m))}) vc[mtc == max(mtc)] #[1] "ab" "abc" "abwabsabs" # Another vector vc vc <- c("ab","bb","abc","acbd","dert","absq","abab") .... vc[mtc == max(mtc)] #[1] "absq" 

Since we consider only the beginning of lines, in the first case the greatest match was "ab" , although there is "abwabsabs" that has "abs" .

Edit: Here is a β€œone template” solution, maybe it can be more concise, but here we go ...

 vc <- c("ab","bb","abc","acbd","dert","abwabsabs") (auxOne <- sapply((nchar(s)-1):1, function(i) substr(s, 1, i))) #[1] "ab" "a" (auxTwo <- sapply(nchar(s):2, function(i) substring(s, i))) #[1] "s" "bs" l <- attr(regexpr( paste0("^((",s,")|",paste0("(",auxOne,"(?!",auxTwo,"))",collapse="|"),")"), vc, perl = TRUE), "match.length") vc[l == max(l)] #[1] "ab" "abc" "abwabsabs" 
+2
source

Here's a function that uses grep to check if a given string s matches the beginning of any string in vc , recursively removing a character from the end of s :

 myfun <- function(s, vc) { notDone <- TRUE maxChar <- max(nchar(vc)) # EDIT: these two lines truncate s to s <- substr(s, 1, maxChar) # the maximum number of chars in vc subN <- nchar(s) while(notDone & subN > 0){ ss <- substr(s, 1, subN) ans <- grep(sprintf("^%s", ss), vc, val = TRUE) if(length(ans)) { notDone <- FALSE } else { subN <- subN - 1 } } return(ans) } s <- "abs" # Updated vc from @Julius answer vc <- c("ab","bb","abc","acbd","dert","absq","abab") > myfun(s, vc) [1] "absq" # And there no infinite recursion if there no match > myfun("q", "a") character(0) 
+1
source

Just a note, long after the triebeard package exists; it is very, very effective and convenient for finding the longest or partial matches.

0
source

All Articles