FParsec IDs and Keywords

For languages ​​with keywords, some special tricking is necessary to prevent, for example, “if” being interpreted as an identifier, and “ifSomeVariableName” to become the “if” keyword, followed by the identifier “SomeVariableName” in the token stream.

For recursive descent and Lex / Yacc, I just took the approach (in accordance with a useful instruction) to convert the flow of tokens between the lexer and the parser.

However, FParsec does not actually take a separate lexical step, so I wonder how best to handle this. Speaking of, does Haskell Parsec seem to support the lexer layer, but FParsec doesn't?

+3
source share
2 answers

I think this problem is very simple. The answer is that you should:

  • Parse the full word ( [az]+ ), lowercase only;
  • Check if it belongs to the dictionary; if yes, return a keyword ; otherwise the parser will disappear
  • Separate identifier separately;

eg. (just hypothetical code, not verified):

 let keyWordSet = System.Collections.Generic.HashSet<_>( [|"while"; "begin"; "end"; "do"; "if"; "then"; "else"; "print"|] ) let pKeyword = (many1Satisfy isLower .>> nonAlphaNumeric) // [az]+ >>= (fun s -> if keyWordSet.Contains(s) then (preturn x) else fail "not a keyword") let pContent = pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier 

The code above will parse the keyword or identifier twice. To fix this, you can also:

  • Parse the whole word ( [az][AZ]+[az][AZ][0-9]+ ), for example. all alphanumeric;
  • Check if it is a keyword or identifier (lowercase and dictionary) and either
    • Keyword Return
    • Returns identifier

PS Do not forget to order "cheaper" parsers first, if this does not spoil the logic.

+4
source

You can define a parser for spaces and check if a keyword or identifier is respected. For example, some general parser would look like

let pWhiteSpace = pLineComment <| > pMultilineComment <| > pSpaces

this will require at least one space

let ws1 = skipMany1 pWhiteSpace

then if it looks

let pIf = pstring "if". → ws1

0
source

All Articles