I am trying to convert ant ANTLR3 grammar to ANTLR4 to use it with antlr4-python2-runtime. This grammar is a fuzzy C / C ++ parser.
After converting it (basically removing tree operators and semantic / syntactic predicates), I generated Python2 files using:
java -jar antlr4.5-complete.jar -Dlanguage=Python2 CPPGrammar.g4
And the code is generated without any errors, so I import it into my python project (I use PyCharm) to run some tests:
import sys, time from antlr4 import * from parser.CPPGrammarLexer import CPPGrammarLexer from parser.CPPGrammarParser import CPPGrammarParser currenttimemillis = lambda: int(round(time.time() * 1000)) def is_string(object): return isinstance(object,str) def parsecommandstringline(argv): if(2!=len(argv)): raise IndexError("Invalid args size.") if(is_string(argv[1])): return True else: raise TypeError("Argument must be str type.") def doparsing(argv): if parsecommandstringline(argv): print("Arguments: OK - {0}".format(argv[1])) input = FileStream(argv[1]) lexer = CPPGrammarLexer(input) stream = CommonTokenStream(lexer) parser = CPPGrammarParser(stream) print("*** Parser: START ***") start = currenttimemillis() tree = parser.code() print("*** Parser: END *** - {0} ms.".format(currenttimemillis()-start)) pass def main(argv): tree = doparsing(argv) pass if __name__ == '__main__': main(sys.argv)
The problem is that the parsing is very slow. With a file containing ~ 200 lines, it takes more than 5 minutes, while parsing the same file in antlrworks takes only 1-2 seconds. Analyzing the antlrworks tree, I noticed that the expr rule and all its descendants are called very often, and I think I need to simplify / change these rules in order to make the parser work faster: 
Is my assumption correct or did I make a mistake while converting the grammar? What can be done for parsing as fast as on antlrworks?
UPDATE: I exported the same grammar in Java and took only 795 ms to complete the parsing. The problem seems to be more related to the python implementation than to the grammar itself. Is there anything that can be done to speed up Python parsing?
I read here that python can be 20-30 times slower than java, but in my case python is ~ 400 times slower!
java python parsing antlr antlr4
Vektor88
source share