Scala syntax scheme using Scala parser combinators

I am writing a small schema interpreter in Scala, and I ran into problems with parsing lists in Scheme. My code parses lists that contain multiple numbers, identifiers, and logic elements, but it suffocates if I try to parse a list containing multiple lines or lists. What am I missing?

Here is my parser:

class SchemeParsers extends RegexParsers { // Scheme boolean #t and #f translate to Scala true and false def bool : Parser[Boolean] = ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false} // A Scheme identifier allows alphanumeric chars, some symbols, and // can't start with a digit def id : Parser[String] = """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s} // This interpreter only accepts numbers as integers def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt} // A string can have any character except ", and is wrapped in " def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' ^^ {case s => s} // A Scheme list is a series of expressions wrapped in () def list : Parser[List[Any]] = '(' ~> rep(expr) <~ ')' ^^ {s: List[Any] => s} // A Scheme expression contains any of the other constructions def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s} } 
+4
source share
2 answers

As @ Gabe correctly pointed out, you left raw white spaces:

 scala> object SchemeParsers extends RegexParsers { | | private def space = regex("[ \\n]*".r) | | // Scheme boolean #t and #f translate to Scala true and false | private def bool : Parser[Boolean] = | ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false} | | // A Scheme identifier allows alphanumeric chars, some symbols, and | // can't start with a digit | private def id : Parser[String] = | """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r | | // This interpreter only accepts numbers as integers | private def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt} | | // A string can have any character except ", and is wrapped in " | private def str : Parser[String] = '"' ~> """[^""]*""".r <~ '"' <~ space ^^ {case s => s} | | // A Scheme list is a series of expressions wrapped in () | private def list : Parser[List[Any]] = | '(' ~> space ~> rep(expr) <~ ')' <~ space ^^ {s: List[Any] => s} | | // A Scheme expression contains any of the other constructions | private def expr : Parser[Any] = id | str | num | bool | list ^^ {case s => s} | | def parseExpr(str: String) = parse(expr, str) | } defined module SchemeParsers scala> SchemeParsers.parseExpr("""(("a" "b") ("a" "b"))""") res12: SchemeParsers.ParseResult[Any] = [1.22] parsed: List(List(a, b), List(a, b)) scala> SchemeParsers.parseExpr("""("a" "b" "c")""") res13: SchemeParsers.ParseResult[Any] = [1.14] parsed: List(a, b, c) scala> SchemeParsers.parseExpr("""((1) (1 2) (1 2 3))""") res14: SchemeParsers.ParseResult[Any] = [1.20] parsed: List(List(1), List(1, 2), List(1, 2, 3)) 
+3
source

The only problem with the code is the use of characters instead of strings. Below, I removed the redundant ^^ { case s => s } and replaced all characters with strings. Next I will discuss this issue below.

 class SchemeParsers extends RegexParsers { // Scheme boolean #t and #f translate to Scala true and false def bool : Parser[Boolean] = ("#t" | "#f") ^^ {case "#t" => true; case "#f" => false} // A Scheme identifier allows alphanumeric chars, some symbols, and // can't start with a digit def id : Parser[String] = """[a-zA-Z=*+/<>!\?][a-zA-Z0-9=*+/<>!\?]*""".r ^^ {case s => s} // This interpreter only accepts numbers as integers def num : Parser[Int] = """-?\d+""".r ^^ {case s => s toInt} // A string can have any character except ", and is wrapped in " def str : Parser[String] = "\"" ~> """[^""]*""".r <~ "\"" // A Scheme list is a series of expressions wrapped in () def list : Parser[List[Any]] = "(" ~> rep(expr) <~ ")" ^^ {s: List[Any] => s} // A Scheme expression contains any of the other constructions def expr : Parser[Any] = id | str | num | bool | list } 

All Parsers have implicit accept for their Elem types. So, if the main element is Char , for example, in RegexParsers , then they imply the action of accepting implicit acceptance, which happens here for the characters ( , ) and " , which are characters in your code.

What RegexParsers does automatically is to skip spaces (defined as protected val whiteSpace = """\s+""".r so you can override this) automatically at the beginning of any String or Regex . It also takes care of moving the positioning cursor through the space in case of error messages.

One consequence of this that you did not seem to realize is that " a string beginning with a space" will have its prefix space removed from the processed output, which is unlikely to be something you want.: - )

Also, since \s contains newlines, a newline will be acceptable before any identifier, which may or may not be what you want.

You can disable the space in your regular expression as a whole by overriding skipWhiteSpace . On the other hand, by default skipWhiteSpace checks the length of whiteSpace , so you can turn it on and off by simply manipulating the whiteSpace value during the parsing process.

+1
source

All Articles