Since input, such as s:3:"a"b"; , is valid, you cannot define a String token in your lexer unless the first and last double quotes always start and end. But I think that it is not.
So you need the lexer rule:
SString : 's:' Int ':"' ( . )* '";' ;
In other words: match a s: then the integer value, followed by :" , then one or more characters, which can be any, ending in "; . But you need to tell lexer to stop consumption when the Int value is not reached. You can do this by mixing some simple code in your grammar to do this. You can insert simple code by wrapping it inside { and } . Therefore, first convert the value that the Int token stores into an integer variable called chars :
SString : 's:' Int {chars = int($Int.text)} ':"' ( . )* '";' ;
Now paste the code inside the loop ( . )* To stop its consumption as soon as the chars counts to zero:
SString : 's:' Int {chars = int($Int.text)} ':"' ( {if chars == 0: break} . {chars = chars-1} )* '";' ;
and what is he.
Little demo grammar:
grammar Test; options { language=Python; } parse : (SString {print 'parsed: [\%s]' \% $SString.text})+ EOF ; SString : 's:' Int {chars = int($Int.text)} ':"' ( {if chars == 0: break} . {chars = chars-1} )* '";' ; Int : '0'..'9'+ ;
(note that you need to avoid % inside your grammar!)
And the test script:
import antlr3 from TestLexer import TestLexer from TestParser import TestParser input = 's:6:"length";s:1:""";s:0:"";s:3:"end";' char_stream = antlr3.ANTLRStringStream(input) lexer = TestLexer(char_stream) tokens = antlr3.CommonTokenStream(lexer) parser = TestParser(tokens) parser.parse()
which produces the following output:
parsed: [s:6:"length";] parsed: [s:1:""";] parsed: [s:0:"";] parsed: [s:3:"end";]
Bart kiers
source share