How to write ANTLR parser for JSP / ASP / PHP, how languages?

I am new to parser generators and I wonder what the ANTLR grammar might look like for an embedded language like JSP / ASP / PHP, but unfortunately the ANTLR site does not provide such grammar files.

More precisely, I don’t know exactly how to determine the AnyText token that matches all (including keywords that have no meaning outside the blocks of code) and can still correctly recognize them inside the blocks.

For example, the following snapshot should be labeled as something like: AnyText, BlockBegin, Keyword, BlockEnd, AnyText.

lorem ipsum KEYWORD dolor sit <% KEYWORD %> amet 

Perhaps there is another parser generator that is better suited to my needs. I have only tried ANTLR so far, due to its huge popularity here on stackoverflow :)

Thank you very much in advance!

+6
parsing antlr
source share
2 answers

I can’t speak for ANTLR, since I use another lexer / parser ( DMS Software Reengineering Toolkit , for which I have developed just such JSPs and PHP lexer / parsers. (ASP is no different, as you noticed in your question).

But the main idea is that the lexer needs lexical modes for recognition when you select "anytext" and when you process the "real" text of a programming language. Therefore, you need an initial lexical mode, say, HTML, whose task is to absorb HTML text, and when it encounters the transition to PHP, it switches modes. You also need a PHP mode that collects all PHP tokens and switches back to HTML mode when transition characters are encountered. Here's a sketch:

 %%HTML -- mode #token HTMLText "~[]* \< \% " << (GotoPHPMode) >> %%PHP -- mode #token KEYWORD "KEYWORD" ... #token '%>' "\%\>" << (GotoHTMLMode) >> 

Your lexer generator will most likely have some kind of mode switching ability that you have to use instead. And you will probably find that lexing HTML material is more complex than it sounds (you need to worry about <SCRIPT tags and many other crazy HTML materials, but these are details that I believe you can handle.

+3
source share

I came across this project http://code.google.com/p/phpparser/ which also contains an ANTLR grammar file for parsing PHP: http://code.google.com/p/phpparser/source/browse/grammar/ Php.g

Hope this helps.

+1
source share

All Articles