Extract patterns from text in R

Question

Extract patterns from text in R

My details:

t <- "The data is like hi hi hi hi and hi hi end"

and my regex:

 grammer <- "[[:space:]]*(hi)+[[:space:]]"

After doing below two lines:

 res <- gregexpr(grammer, t) regmatches(t, res)

I got the conclusion:

  [[1]] [1] " hi " "hi " "hi " "hi " " hi " "hi "

however, I want something like: " hi hi hi hi " and " hi hi "

+7

regex r

jay_phate Oct 15 '14 at 9:31

source share

1 answer

Avinash raj · Answer 1 · 2014-10-15T09:37:43+0000

You can do it,

 > t<-"The data is like hi hi hi hi and hi hi end" > grammer<-"[[:space:]]*(hi[[:space:]])+[[:space:]]*" > res<-gregexpr(grammer, t) > regmatches(t, res) [[1]] [1] " hi hi hi hi " " hi hi "

OR

 > grammer<-"[[:space:]]*(hi[[:space:]])+" > res<-gregexpr(grammer, t) > regmatches(t, res) [[1]] [1] " hi hi hi hi " " hi hi "

OR

 > t <- "The data is like hi hi hi hi and hi hi end hi" > grammer<-"[[:space:]]*(hi\\>[[:space:]]?)+" > res<-gregexpr(grammer, t) > regmatches(t, res) [[1]] [1] " hi hi hi hi " " hi hi " " hi"

No leading or following spaces.

 > t <- "The data is like hi hi hi hi and hi hi end hi" > grammer<-"hi\\>([[:space:]]hi)*" > res<-gregexpr(grammer, t) > regmatches(t, res) [[1]] [1] "hi hi hi hi" "hi hi" "hi"

Explanation:

[[:space:]]* Matches a run character zero or more times.
(hi[[:space:]])+ Matches the string hi and the next space one or more times.

Extract patterns from text in R

More articles: