I think this problem is very simple. The answer is that you should:
- Parse the full word (
[az]+ ), lowercase only; - Check if it belongs to the dictionary; if yes, return a
keyword ; otherwise the parser will disappear - Separate
identifier separately;
eg. (just hypothetical code, not verified):
let keyWordSet = System.Collections.Generic.HashSet<_>( [|"while"; "begin"; "end"; "do"; "if"; "then"; "else"; "print"|] ) let pKeyword = (many1Satisfy isLower .>> nonAlphaNumeric) // [az]+ >>= (fun s -> if keyWordSet.Contains(s) then (preturn x) else fail "not a keyword") let pContent = pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
The code above will parse the keyword or identifier twice. To fix this, you can also:
- Parse the whole word (
[az][AZ]+[az][AZ][0-9]+ ), for example. all alphanumeric; - Check if it is a keyword or identifier (lowercase and dictionary) and either
- Keyword Return
- Returns identifier
PS Do not forget to order "cheaper" parsers first, if this does not spoil the logic.
source share