Why is my antlr lexer java class "code too large"?

Question

Why is my antlr lexer java class "code too large"?

This is the Antlr lexer (sorry for the long file):

lexer grammar SqlServerDialectLexer;
/* T-SQL words */
AND: 'AND';
BIGINT: 'BIGINT';
BIT: 'BIT';
CASE: 'CASE';
CHAR: 'CHAR';
COUNT: 'COUNT';
CREATE: 'CREATE';
CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP';
DATETIME: 'DATETIME';
DECLARE: 'DECLARE';
ELSE: 'ELSE';
END: 'END';
FLOAT: 'FLOAT';
FROM: 'FROM';
GO: 'GO';
IMAGE: 'IMAGE';
INNER: 'INNER';
INSERT: 'INSERT';
INT: 'INT';
INTO: 'INTO';
IS: 'IS';
JOIN: 'JOIN';
NOT: 'NOT';
NULL: 'NULL';
NUMERIC: 'NUMERIC';
NVARCHAR: 'NVARCHAR';
ON: 'ON';
OR: 'OR';
SELECT: 'SELECT';
SET: 'SET';
SMALLINT: 'SMALLINT';
TABLE: 'TABLE';
THEN: 'THEN';
TINYINT: 'TINYINT';
UPDATE: 'UPDATE';
USE: 'USE';
VALUES: 'VALUES';
VARCHAR: 'VARCHAR';
WHEN: 'WHEN';
WHERE: 'WHERE';

QUOTE: '\'' { textMode = !textMode; };
QUOTED: {textMode}?=> ~('\'')*;

EQUALS: '=';
NOT_EQUALS: '!=';
SEMICOLON: ';';
COMMA: ',';
OPEN: '(';
CLOSE: ')';
VARIABLE: '@' NAME;
NAME:
    ( LETTER | '#' | '_' ) ( LETTER | NUMBER | '#' | '_' | '.' )*
    ;
NUMBER: DIGIT+;

fragment LETTER: 'a'..'z' | 'A'..'Z';
fragment DIGIT: '0'..'9';
SPACE
    :
    ( ' ' | '\t' | '\n' | '\r' )+
    { skip(); }
    ;

JDK 1.6 says code too largeand cannot compile it. Why and how to solve the problem?

+5

java antlr antlr3

yegor256 Jun 08 '11 at 19:16

source share

3 answers

. , . , NAME : NAME "" .

:

A.g

lexer grammar A;

SELECT: 'SELECT';
SET: 'SET';
SMALLINT: 'SMALLINT';
TABLE: 'TABLE';
THEN: 'THEN';
TINYINT: 'TINYINT';
UPDATE: 'UPDATE';
USE: 'USE';
VALUES: 'VALUES';
VARCHAR: 'VARCHAR';
WHEN: 'WHEN';
WHERE: 'WHERE';

QUOTED: '\'' ('\'\'' | ~'\'')* '\'';

EQUALS: '=';
NOT_EQUALS: '!=';
SEMICOLON: ';';
COMMA: ',';
OPEN: '(';
CLOSE: ')';
VARIABLE: '@' NAME;
NAME:
    ( LETTER | '#' | '_' ) ( LETTER | NUMBER | '#' | '_' | '.' )*
    ;
NUMBER: DIGIT+;

fragment LETTER: 'a'..'z' | 'A'..'Z';
fragment DIGIT: '0'..'9';
SPACE
    :
    ( ' ' | '\t' | '\n' | '\r' )+
    { skip(); }
    ;

SqlServerDialectLexer.g

lexer grammar SqlServerDialectLexer;

import A;

AND: 'AND';
BIGINT: 'BIGINT';
BIT: 'BIT';
CASE: 'CASE';
CHAR: 'CHAR';
COUNT: 'COUNT';
CREATE: 'CREATE';
CURRENT_TIMESTAMP: 'CURRENT_TIMESTAMP';
DATETIME: 'DATETIME';
DECLARE: 'DECLARE';
ELSE: 'ELSE';
END: 'END';
FLOAT: 'FLOAT';
FROM: 'FROM';
GO: 'GO';
IMAGE: 'IMAGE';
INNER: 'INNER';
INSERT: 'INSERT';
INT: 'INT';
INTO: 'INTO';
IS: 'IS';
JOIN: 'JOIN';
NOT: 'NOT';
NULL: 'NULL';
NUMERIC: 'NUMERIC';
NVARCHAR: 'NVARCHAR';
ON: 'ON';
OR: 'OR';

:

java -cp antlr-3.3.jar org.antlr.Tool SqlServerDialectLexer.g 
javac -cp antlr-3.3.jar *.java

, org.antlr.Tool " ": ANTLR . , :

import A, B, C;

: QUOTED . , , ( SQL), , , " ". .

, , .

+5

Bart Kiers 08 . '11 20:12

. , ?

-, - , , .

0

Darien 08 . '11 19:27

source share

Gunther · Accepted Answer · 2011-06-09T11:27:39+0000

In fact, I would not say that this is a big grammar, and there must be a reason why it does not create reasonably sized code.

I think the problem is directly related to this rule:

QUOTED: {textMode}?=> ~('\'')*;

Is there any specific reason why you want the QUOTED part to be a separate marker and not leave it in conjunction with a quote, since Bart also put it in his grammar? This will also make the variable textModeobsolete.

QUOTE QUOTED

QUOTED: '\'' (~'\'')* '\'';

, , .

Why is my antlr lexer java class "code too large"?

A.g

SqlServerDialectLexer.g

More articles: