Scala Parser Combinators for recursive bnf?

Im trying to match this syntax:

pgm ::= exprs exprs ::= expr [; exprs] expr ::= ID | expr . [0-9]+ 

My scala packrat parser combinator is as follows:

 import scala.util.parsing.combinator.PackratParsers import scala.util.parsing.combinator.syntactical._ object Dotter extends StandardTokenParsers with PackratParsers { lexical.delimiters ++= List(".",";") def pgm = repsep(expr,";") def expr :Parser[Any]= ident | expr~"."~num def num = numericLit def parse(input: String) = phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match { case Success(result, _) => println("Success!"); Some(result) case n @ _ => println(n);println("bla"); None } def main(args: Array[String]) { val prg = "x.1.2.3;" + "y.4.1.1;" + "z;" + "n.1.10.30" parse(prg); } } 

But that does not work. Or he "corresponds to the greedy," and tells me:

 [1.2] failure: end of input expected x.1.2.3;y.4.1.1;z;n.1.10.30 

or if I change | on a ||| I get stackoverflow:

 Exception in thread "main" java.lang.StackOverflowError at java.lang.Character.isLetter(Unknown Source) at java.lang.Character.isLetter(Unknown Source) at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32) at scala.util.parsing.combinator.lexical.Lexical$$anonfun$letter$1.apply(Lexical.scala:32) ... 

I already understood why I get errors; What can I do to parse the syntax as above? It seems this is not esoteric for me

EDIT: Based on the article provided at http://scala-programming-language.1934581.n4.nabble.com/Packrat-parser-guidance-td1956908.html I found out that my program did not actually use the new parrat parser.

Those. change Parser[Any] to PackratParser[Any] and use lazy val instead of def

I rewrote the above:

 import scala.util.parsing.combinator.PackratParsers import scala.util.parsing.combinator.syntactical._ object Dotter extends StandardTokenParsers with PackratParsers { lexical.delimiters ++= List(".",";") lazy val pgm : PackratParser[Any] = repsep(expr,";") lazy val expr :PackratParser[Any]= expr~"."~num | ident lazy val num = numericLit def parse(input: String) = phrase(pgm)(new PackratReader(new lexical.Scanner(input))) match { case Success(result, _) => println("Success!"); Some(result) case n @ _ => println(n);println("bla"); None } def main(args: Array[String]) { val prg = "x.1.2.3 ;" + "y.4.1.1;" + "z;" + "n.1.10.30" parse(prg); } } 
+6
scala parser-combinators ebnf
source share
2 answers

The problem is (at least in part) that you are not actually using the Packrat parser. See the documentation for Scala PackratParsers for more information .

Using PackratParsers is very similar to using parsers:

  • any class / attribute that extends the parser (directly or through a subclass) can be mixed in PackratParsers. Example: MyGrammar object extends StandardTokenParsers with PackratParsers
  • every grammar product previously declared as def without formal parameters becomes lazy, and its type changes from Parser [Elem] to PackratParser [Elem]. So, for example, def production: Parser [Int] = {...} becomes lazy val production: PackratParser [Int] = {...}
  • Important: using PackratParsers is not all or nothing. They can be mixed freely with regular parsers in one grammar.

I don’t know enough about Scala 2.8 parser compilers to fix this completely, but with the following changes, I was able to get it to parse to a semicolon, which is an improvement over what you have executed.

 object Dotter extends StandardTokenParsers with PackratParsers { lexical.delimiters ++= List(".",";") lazy val pgm:PackratParser[Any] = repsep(expr,";") lazy val expr:PackratParser[Any]= ident ||| (expr~"."~numericLit) def parse(input: String) = phrase(expr)(lex(input)) match { case Success(result, _) => println("Success!"); Some(result) case n @ _ => println(n);println("bla"); None } def lex(input:String) = new PackratReader(new lexical.Scanner(input)) } 
+10
source share

Production

 expr ::= ID | expr . [0-9]+ 

remains recursive. It expands to

 expr ::= ID expr ::= expr . [0-9]+ 

where left recursion happens on the second line. This is what makes the parser overflow the stack.

You must rewrite your grammar while avoiding left recursive productions.

 expr ::= ID {. [0-9]+} 
+1
source share

All Articles