Difficulty getting Parsec parser to correctly skip spaces

I am new to Parsec (and for parsers in general), and I am having some problems with this parser, I wrote:

list = char '(' *> many (spaces *> some letter) <* spaces <* char ')' 

The idea is to analyze lists in this format (I work with s-expressions):

 (firstElement secondElement thirdElement and so on) 

I wrote this code to check it out:

 import Control.Applicative import Text.ParserCombinators.Parsec hiding (many) list = char '(' *> many (spaces *> some letter) <* spaces <* char ')' test s = do putStrLn $ "Testing " ++ show s ++ ":" parseTest list s putStrLn "" main = do test "()" test "(hello)" test "(hello world)" test "( hello world)" test "(hello world )" test "( )" 

This is the result I get:

 Testing "()": [] Testing "(hello)": ["hello"] Testing "(hello world)": ["hello","world"] Testing "( hello world)": ["hello","world"] Testing "(hello world )": parse error at (line 1, column 14): unexpected ")" expecting space or letter Testing "( )": parse error at (line 1, column 3): unexpected ")" expecting space or letter 

As you can see, it does not work when there is a space between the last element of the list and the space ) . I don’t understand why the empty space is not consumed by spaces , which I set immediately before <* char ')' . What stupid mistake did I make?

+6
source share
3 answers

The problem is that finite spaces are consumed by spaces in the many argument,

 list = char '(' *> many (spaces *> some letter) <* spaces <* char ')' -- ^^^^^^ that one 

and then the parser expects some letter , but finds a closing parenthesis and therefore fails.

To solve this problem, use spaces only after tokens,

 list = char '(' *> spaces *> many (some letter <* spaces) <* char ')' 

works as desired:

 $ runghc lisplists.hs Testing "()": [] Testing "(hello)": ["hello"] Testing "(hello world)": ["hello","world"] Testing "( hello world)": ["hello","world"] Testing "(hello world )": ["hello","world"] Testing "( )": [] 
+12
source

The problem is that after the parser many (spaces *> some letter) sees a space, it itself parses another element, since by default Parsec only looks forward at one character and does not back off.

The sledgehammer solution should use try to enable backtracking, but problems like this are best avoided by simply analyzing optional spaces after each token, as shown in Daniel's answer .

+3
source

This is a bit complicated. By default, parsers are greedy. What does this mean in your case? When you try to parse (hello world ) , you start by parse ( then you try to match some spaces and identifiers. Therefore, we do this. There are no spaces, but there is an identifier. We did. We try again with the world. Now there is _) . You will try the parser (spaces *> some letter) . This makes him greedy: so you give way, and now you expect some kind of letter, but instead get it ) . At this point, the parser fails, but it already consumes space, so you are doomed. You can force this analyzer to roll back using the try combinator: try (many (spaces *> some letter))

0
source

All Articles