Scala Capture multiple elements in Regex

I am trying to capture parts of a multi-line regular expression in Scala. The input has the form:

val input = """some text |begin { | content to extract | content to extract |} |some text |begin { | other content to extract |} |some text""".stripMargin 

I tried several possibilities which should receive the text from begin { } blocks. One of them:

 val Block = """(?s).*begin \{(.*)\}""".r input match { case Block(content) => println(content) case _ => println("NO MATCH") } 

I get NO MATCH . If I omitted \} , the regex looks like (?s).*begin \{(.*) , And it matches the last block, including the unwanted ones } and "some text". I checked my regular expression on rubular.com, as well as on /.*begin \{(.*)\}/m , and it matches at least one block. I thought that when my Scala expression matches the same, I can start using findAllIn to match all the blocks. What am I doing wrong?

I looked at the Scala Regex enable Multiline option , but I was not able to capture all occurrences of text blocks, for example, Seq[String] . Any help is appreciated.

+7
scala regex
source share
3 answers

As Alex said, when using pattern matching to extract fields from regular expressions, the pattern acts as if it were limited (i.e. using ^ and $ ). The usual way to avoid this problem is to use findAllIn first. Thus:

 val input = """some text |begin { | content to extract | content to extract |} |some text |begin { | other content to extract |} |some text""".stripMargin val Block = """(?s)begin \{(.*)\}""".r Block findAllIn input foreach (_ match { case Block(content) => println(content) case _ => println("NO MATCH") }) 

Otherwise, you can use .* At the beginning and at the end to get around this limitation:

 val Block = """(?s).*begin \{(.*)\}.*""".r input match { case Block(content) => println(content) case _ => println("NO MATCH") } 

By the way, you probably want non-impatience:

 val Block = """(?s)begin \{(.*?)\}""".r Block findAllIn input foreach (_ match { case Block(content) => println(content) case _ => println("NO MATCH") }) 
+10
source share

When playing the match, I believe that full implication is required. Your compliance is equivalent to:

 val Block = """^(?s).*begin \{(.*)\}$""".r 

It works if you add. * to end:

 val Block = """(?s).*begin \{(.*)\}.*""".r 

I could not find the documentation on this, but I ran into this problem.

+1
source share

As a complement to the other answers, I wanted to point out the existence of kantan.regex , which allows you to write the following:

 import kantan.regex.ops._ // The type parameter is the type as which to decode results, // the value parameters are the regular expression to apply and the group to // extract data from. input.evalRegex[String]("""(?s)begin \{(.*?)\}""", 1).toList 

This gives:

 List(Success( content to extract content to extract ), Success( other content to extract )) 
0
source share

All Articles