Robin wrote:
I thought writing rules for inline markup text would be easy
I have to admit that I am not familiar with this markup language, but it seems to resemble BB-Code or Wiki markup, which are not easy to translate into ANTLR grammar! These languages do not make it easy to recognize tokens, because they depend on where these tokens are. Sometimes white spaces have special meaning (with lists of definitions). So no, it's not at all easy, IMO. Therefore, if this is just an exercise to familiarize yourself with ANTLRs (or the parser generators in general), I highly recommend choosing something else for parsing.
Robin wrote:
Can someone point out my mistakes and maybe give me a hint on how to fit the regular text?
You must first understand that ANTLR creates a lexer (tokenizer) and a parser. Lexer rules begin with an uppercase letter, and parser rules start with a lowercase. The parser can only work with tokens (objects that are executed by lexer rules). To maintain order, you should not use literal tokens inside parser rules (see Rule q in the grammar below). Also, ~ (negation) meta char has a different meaning depending on where it is used (in a parser or lexer rule).
Take the following grammar:
p : T; q : ~'z'; T : ~'x'; U : 'y';
ANTLR will first "move" the literal 'z' to the lexer rule, for example:
p : T; q : ~RANDOM_NAME; T : ~'x'; U : 'y'; RANDOM_NAME : 'z';
(the name RANDOM_NAME not used, but it does not matter). Now the q parsing rule does not match any character other than 'z' ! Negation inside the analyzer rule cancels the token (or lexer rule). Thus, ~RANDOM_NAME will match either the lexer T rule or the lexer U rule.
Inside lexer rules, ~ cancels the (single!) Character. Thus, the lexer T rule will match any character in the range \u0000 .. \uFFFF except for 'x' . Note that the following rule: ~'ab' not valid inside the lexer rule: you can only undo single character sets.
So all these ~'???' inside the rules of your parser are wrong (wrong, as in: they don’t behave the way you expect from them).
Robin wrote:
Is there a way to set priority in grammar rules? Perhaps this may be an occasion.
Yes, the order is from top to bottom in both lexer and parser rules (where top has the highest priority). Let's say parse is the entry point of your grammar:
parse : p | q ;
then p will be checked first, and if that fails, q tries to match.
Regarding lexer rules, rules that are keywords, for example, are matched before a rule that can match the specified keywords:
// first keywords: WHILE : 'while'; IF : 'if' ELSE : 'else'; // and only then, the identifier rule: ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;