You can specify regex functions in Alex.
Here, for example, is a regular expression in Alex to match floating point numbers:
$space = [\ \t\xa0] $digit = 0-9 $octit = 0-7 $hexit = [$digit AF af] @sign = [\-\+] @decimal = $digit+ @octal = $octit+ @hexadecimal = $hexit+ @exponent = [eE] [\-\+]? @decimal @number = @decimal | @decimal \. @decimal @exponent? | @decimal @exponent | 0[oO] @octal | 0[xX] @hexadecimal lex :- @sign? @number { strtod }
When we match the floating point number, we send a parsing function to work with this captured string, which we can then wrap and expose to the user as a parsing function:
readDouble :: ByteString -> Maybe (Double, ByteString) readDouble str = case alexScan (AlexInput '\n' str) 0 of AlexEOF -> Nothing AlexError _ -> Nothing AlexToken (AlexInput _ rest) n _ -> case strtod (B.unsafeTake n str) of d -> d `seq` Just $! (d , rest)
A good consequence of using Alex for this regex matching is that performance is good because the regex engine is statically compiled. It can also be represented as a regular Haskell library built using cabal. For a complete implementation, see Bytestring-lexing .
The general advice on when to use a lexer instead of regular expression matching is that if you have a grammar for the tokens you are trying to match, as I did for floating point, use Alex. If you do not, and the structure will be more ad hoc, use the regular expression mechanism.
source share