I am working on the separation of lexing and parsing parser. After some tests, I realized that error messages are less useful when I use tokens other than Parsec Char tokens.
Here are some examples of Parsec error messages when using Char tokens:
ghci> P.parseTest (string "asdf" >> spaces >> string "ok") "asdf wrong" parse error at (line 1, column 7): unexpected "w" expecting space or "ok" ghci> P.parseTest (choice [string "ok", string "nop"]) "wrong" parse error at (line 1, column 1): unexpected "w" expecting "ok" or "nop"
So, the parser shows which line is expected when an unexpected line is detected, and the selection parser shows which alternatives.
But when I use the same combinators with my tokens:
ghci> Parser.parseTest ((tok $ Ide "asdf") >> (tok $ Ide "ok")) "asdf " parse error at "test" (line 1, column 1): unexpected end of input
In this case, it does not print the expected.
ghci> Parser.parseTest (choice [tok $ Ide "ok", tok $ Ide "nop"]) "asdf " parse error at (line 1, column 1): unexpected (Ide "asdf","test" (line 1, column 1))
And when I use choice , it does not print alternatives.
I expect this behavior to be due to combinatorial functions, not tokens, but it looks like I'm wrong. How can i fix this?
Here is the full lexir + parser code:
Lexer:
module Lexer ( Token(..) , TokenPos(..) , tokenize ) where import Text.ParserCombinators.Parsec hiding (token, tokens) import Control.Applicative ((<*), (*>), (<$>), (<*>)) data Token = Ide String | Number String | Bool String | LBrack | RBrack | LBrace | RBrace | Keyword String deriving (Show, Eq) type TokenPos = (Token, SourcePos) ide :: Parser TokenPos ide = do pos <- getPosition fc <- oneOf firstChar r <- optionMaybe (many $ oneOf rest) spaces return $ flip (,) pos $ case r of Nothing -> Ide [fc] Just s -> Ide $ [fc] ++ s where firstChar = ['A'..'Z'] ++ ['a'..'z'] ++ "_" rest = firstChar ++ ['0'..'9'] parsePos p = (,) <$> p <*> getPosition lbrack = parsePos $ char '[' >> return LBrack rbrack = parsePos $ char ']' >> return RBrack lbrace = parsePos $ char '{' >> return LBrace rbrace = parsePos $ char '}' >> return RBrace token = choice [ ide , lbrack , rbrack , lbrace , rbrace ] tokens = spaces *> many (token <* spaces) tokenize :: SourceName -> String -> Either ParseError [TokenPos] tokenize = runParser tokens ()
Parser:
module Parser where import Text.Parsec as P import Control.Monad.Identity import Lexer parseTest :: Show a => Parsec [TokenPos] () a -> String -> IO () parseTest ps = case tokenize "test" s of Left e -> putStrLn $ show e Right ts' -> P.parseTest p ts' tok :: Token -> ParsecT [TokenPos] () Identity Token tok t = token show snd test where test (t', _) = case t == t' of False -> Nothing True -> Just t
DECISION:
Well, after fp4me responds and reads the Parsec Char source more carefully, I ended up with this:
{-# LANGUAGE FlexibleContexts #-} module Parser where import Text.Parsec as P import Control.Monad.Identity import Lexer parseTest :: Show a => Parsec [TokenPos] () a -> String -> IO () parseTest ps = case tokenize "test" s of Left e -> putStrLn $ show e Right ts' -> P.parseTest p ts' type Parser a = Parsec [TokenPos] () a advance :: SourcePos -> t -> [TokenPos] -> SourcePos advance _ _ ((_, pos) : _) = pos advance pos _ [] = pos satisfy :: (TokenPos -> Bool) -> Parser Token satisfy f = tokenPrim show advance (\c -> if fc then Just (fst c) else Nothing) tok :: Token -> ParsecT [TokenPos] () Identity Token tok t = (Parser.satisfy $ (== t) . fst) <?> show t
Now I get the same error messages:
ghci> Parser.parseTest (select [tok $ Ide "ok", tok $ Ide "nop"]) "asdf"
parse error in (row 1, column 1):
unexpected (Ide "asdf", "test" (row 1, column 3))
expecting Ide to be ok or Ide to nop