How grep just 10 matches in R?
I have a data file where I can grep "you":
> dtq_ml_wuv[grep("you", dtq_ml_wuv$rn), "rn"] "you can take" "you can tell" "you can thank" "you can try" "you can turn" "you can use" "you can visit" "you can work" "you donet know" "you donet need" "you dont know" "you get enough" "you get see" "you go back" "you got keep" "you guys can" "you heard right" "you just go" "you just gotta" "you just look" "you just need" "you just stay" "you know better" "you know else" "you know got" "you know i" "you know if" "you know im" "you know it" "you know just" "you know many" "you know means" "you know one" "you know really" "you know right" "you like see" How can I get grep to stop after searching it, say from 0 to 25 matches?
I tried
> dtq_ml_wuv[grep("you{0, 25}", dtq_ml_wuv$rn), "rn"] But this tells me that the expression is invalid because of the invalid content {} .
Any hints appreciated.
Here is a fragmented version. Adjust the block size to make the most of the speed of the internal compiled code compared to the search compromise too much.
grepn<-function(pattern,x,n,chunk.size=32) { N<-length(x); chunk<-1:chunk.size; k<-1; M<-vector("integer",n+chunk.size); while(k < n && chunk[1] <= N) { i<-na.omit(grep(pattern,x[chunk])); if(length(i)) M[k:(k+length(i)-1)]<-i+chunk[1]-1 k<-k+length(i) chunk<-chunk+chunk.size } return(M[1:(min(k-1,n))]) } Example with data as a character vector in a question
grepn("e",data,n=10,chunk.size=16) [1] 1 2 6 9 10 12 13 15 17 21 I'm not sure if this is the fastest way to solve your problem, maybe someone will come up with a faster way.
I created a vector "x" for grep "a" until it reaches 3
dput(x) c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j") I used a for loop and if / else conditions
out<-NULL for (i in 1:length(x)){ if (grepl("a",x[i])==TRUE){ out<-append(out,x[i]) } else { next } if (length(out)>2){ print(out) break } else { next } } out [1] "a" "a" "a" I compared it with a subset strategy, and it does not differ in time for small vectors. However, when I made my length (x) = 25000000
ptm<-proc.time();x[grep("a",x)][1:3];proc.time()-ptm [1] "a" "a" "a" user system elapsed 2.25 0.06 2.34 vs loop format
proc.time()-ptm user system elapsed 0.01 0.01 0.03