I am working on a parser for a language with
(e.g. a letter followed by several alphanumeric characters or underscores)
integers (any number of digits and possibly carriage ^),
some operators
filename (number of alphanumeric characters and possibly slashes and periods)
Obviously, the file name covers integers and identifiers, so I can’t decide at all if I have a file name or, say, an identifier if the file name does not contain a slash or dot.
But the file name can only follow a specific operator.
My question is, how is this situation usually handled during tokenization? I have a tokenizer with a table (lexer), but I'm not sure how to specify a file name from an integer or identifier. How it's done?
If filename was a superset of integers and identifiers, then I could probably create grammar pieces that could handle this, but the markers overlap ...
source
share