Lua long lines in fslex

I am working on Lua fslex lexer in my free time, using the ocamllex manual as a reference.

I hit several snags trying to correctly tokenize long strings. "Long strings" are limited to tokens '[' ('=')* '[' and ']' ('=')* ']' ; number of characters = must be the same.

In the first implementation, the lexer did not seem to recognize the patterns [[ , producing two LBRACKET , despite the longest matching rule, while [=[ and options where they are recognized correctly. In addition, in the regular expression, it was not possible to verify that the correct closing token is used, stopping at the first capture of ']' ('=')* ']' , regardless of the actual level of the "long line". In addition, fslex does not seem to support the how-to construct in regular expressions.

 let lualongstring = '[' ('=')* '[' ( escapeseq | [^ '\\' '[' ] )* ']' ('=')* ']' (* ... *) | lualongstring { (* ... *) } | '[' { LBRACKET } | ']' { RBRACKET } (* ... *)
let lualongstring = '[' ('=')* '[' ( escapeseq | [^ '\\' '[' ] )* ']' ('=')* ']' (* ... *) | lualongstring { (* ... *) } | '[' { LBRACKET } | ']' { RBRACKET } (* ... *) 

I am trying to solve a problem with another rule in lexer:

 rule tokenize = parse (* ... *) | '[' ('=')* '[' { longstring (getLongStringLevel(lexeme lexbuf)) lexbuf } (* ... *) and longstring level = parse | ']' ('=')* ']' { (* check level, do something *) } | _ { (* aggregate other chars *) } (* or *) | _ { let c = lexbuf.LexerChar(0); (* ... *) }
rule tokenize = parse (* ... *) | '[' ('=')* '[' { longstring (getLongStringLevel(lexeme lexbuf)) lexbuf } (* ... *) and longstring level = parse | ']' ('=')* ']' { (* check level, do something *) } | _ { (* aggregate other chars *) } (* or *) | _ { let c = lexbuf.LexerChar(0); (* ... *) } 

But I got stuck for two reasons: firstly, I don’t think I can β€œpush”, so to speak, the token to the next rule as soon as I finish reading a long line; secondly, I don't like the idea of ​​reading char on char until the correct closing token is found, making the current design useless.

How can I tokenize long Lua strings in fslex? Thanks for reading.

+6
lua f # ocamllex fslex
source share
1 answer

Sorry if I answer my question, but I would like to contribute to solving the problem for future reference.

I save state through lexer function calls using the LexBuffer <_> BufferLocalStore property, which is just a writable instance of IDictionary.

Note. Long brackets are used by both long string and multi-line comments. This is often ignored in Lua grammar.

 let beginlongbracket = '[' ('=')* '[' let endlongbracket = ']' ('=')* ']' rule tokenize = parse | beginlongbracket { longstring (longBracketLevel(lexeme lexbuf)) lexbuf } (* ... *) and longstring level = parse | endlongbracket { if longBracketLevel(lexeme lexbuf) = level then LUASTRING(endLongString(lexbuf)) else longstring level lexbuf } | _ { toLongString lexbuf (lexeme lexbuf); longstring level lexbuf } | eof { failwith "Unexpected end of file in string." }
let beginlongbracket = '[' ('=')* '[' let endlongbracket = ']' ('=')* ']' rule tokenize = parse | beginlongbracket { longstring (longBracketLevel(lexeme lexbuf)) lexbuf } (* ... *) and longstring level = parse | endlongbracket { if longBracketLevel(lexeme lexbuf) = level then LUASTRING(endLongString(lexbuf)) else longstring level lexbuf } | _ { toLongString lexbuf (lexeme lexbuf); longstring level lexbuf } | eof { failwith "Unexpected end of file in string." } 

Here are the functions that I use to simplify data storage in BufferLocalStore:

 let longBracketLevel (str : string) = str.Count(fun c -> c = '=') let createLongStringStorage (lexbuf : LexBuffer<_>) = let sb = new StringBuilder(1000) lexbuf.BufferLocalStore.["longstring"] <- box sb sb let toLongString (lexbuf : LexBuffer<_>) (c : string) = let hasString, sb = lexbuf.BufferLocalStore.TryGetValue("longstring") let storage = if hasString then (sb :?> StringBuilder) else (createLongStringStorage lexbuf) storage.Append(c.[0]) |> ignore let endLongString (lexbuf : LexBuffer<_>) : string = let hasString, sb = lexbuf.BufferLocalStore.TryGetValue("longstring") let ret = if not hasString then "" else (sb :?> StringBuilder).ToString() lexbuf.BufferLocalStore.Remove("longstring") |> ignore ret 

It may not be very functional, but it seems to be done.

  • use the tokenize rule until the beginning of the long bracket is found
  • switch to the longstring rule and the loop until a closing long bracket of the same level is found.
  • store each token that does not match the closing long bracket of the same level in StringBuilder, which, in turn, is stored in the BufferLocalStore in LexBuffer.
  • Once the long line is over, clear the BufferLocalStore.

Edit: The project can be found at http://ironlua.codeplex.com . Lexing and parsing should be fine. I plan to use DLR. Comments and constructive criticism are welcome.

+5
source share

All Articles