I am working on Lua fslex lexer in my free time, using the ocamllex manual as a reference.
I hit several snags trying to correctly tokenize long strings. "Long strings" are limited to tokens '[' ('=')* '[' and ']' ('=')* ']' ; number of characters = must be the same.
In the first implementation, the lexer did not seem to recognize the patterns [[ , producing two LBRACKET , despite the longest matching rule, while [=[ and options where they are recognized correctly. In addition, in the regular expression, it was not possible to verify that the correct closing token is used, stopping at the first capture of ']' ('=')* ']' , regardless of the actual level of the "long line". In addition, fslex does not seem to support the how-to construct in regular expressions.
let lualongstring = '[' ('=')* '[' ( escapeseq | [^ '\\' '[' ] )* ']' ('=')* ']' (* ... *) | lualongstring { (* ... *) } | '[' { LBRACKET } | ']' { RBRACKET } (* ... *)
let lualongstring = '[' ('=')* '[' ( escapeseq | [^ '\\' '[' ] )* ']' ('=')* ']' (* ... *) | lualongstring { (* ... *) } | '[' { LBRACKET } | ']' { RBRACKET } (* ... *)
I am trying to solve a problem with another rule in lexer:
rule tokenize = parse (* ... *) | '[' ('=')* '[' { longstring (getLongStringLevel(lexeme lexbuf)) lexbuf } (* ... *) and longstring level = parse | ']' ('=')* ']' { (* check level, do something *) } | _ { (* aggregate other chars *) } (* or *) | _ { let c = lexbuf.LexerChar(0); (* ... *) }
rule tokenize = parse (* ... *) | '[' ('=')* '[' { longstring (getLongStringLevel(lexeme lexbuf)) lexbuf } (* ... *) and longstring level = parse | ']' ('=')* ']' { (* check level, do something *) } | _ { (* aggregate other chars *) } (* or *) | _ { let c = lexbuf.LexerChar(0); (* ... *) }
But I got stuck for two reasons: firstly, I donβt think I can βpushβ, so to speak, the token to the next rule as soon as I finish reading a long line; secondly, I don't like the idea of ββreading char on char until the correct closing token is found, making the current design useless.
How can I tokenize long Lua strings in fslex? Thanks for reading.
lua f # ocamllex fslex
Raine
source share