Best way to parse a language which is ALMOST Python?

Question

Best way to parse a language which is ALMOST Python?

I am working on a domain specific language implemented on top of Python. The grammar is so close to Python that so far we have only done some simple trivial transformations and then fed them to ast . For example, the indentation is replaced by the #endfor / #endwhile / #endif operators, so we normalize the indentation while it is still a string.

I am wondering if there is a better way? As far as I can tell, ast hard-coded to parse Python grammar, and I can't find any documentation other than http://docs.python.org/library/ast.html#module-ast (and the source itself, I I guess).

Does anyone have personal experience with PyParsing, ANTLR or PLY?

There are vague plans to rewrite the interpreter into something that turns our language into valid Python and passes it into the Python interpreter, so I would like something compatible with compile , but this is not an interruption to the deal.

Update: It occurred to me that

 from __future__ import print_function, with_statement

changes the way Python parses the following source. However, PEP 236 assumes that this is the window syntax for the compiler function. Can someone confirm that trying to override / extend __future__ not the correct solution to my problem?

+4

python parsing

Wang Aug 17 '10 at 16:22

source share

1 answer

S. Lott · Answer 1 · 2010-08-17T17:39:03+0000

PLY works. This is strange because it mimics lex / yacc in such a way that it is not terribly pythonic.

Both lex and yacc have an implicit interface that allows you to run lex output as a standalone program. This "feature" is carefully preserved. Similarly for yacc-like PLY functions. The “feature” for creating a strange, implicit autonomous main program is carefully preserved.

However, PLY as a lex / yacc compatible toolkit is pretty nice. All your lex / yacc skills are saved.

[Editorial comment. Fixing a Python grammar is likely to be a waste of time. Almost everyone can retreat correctly without any help. Check out C, Java, C ++, and even Pascal code, and you'll see that almost everyone can backtrack very well. Indeed, people go long distances to retreat Java where it is not needed. If padding in Java doesn't matter, why do people do such a good job?]

Best way to parse a language which is ALMOST Python?

More articles: