Parsing nested function calls using pyparsing

I am trying to use pyparsing to parse function calls in the form:

 f(x, y) 

It is easy. But since it is a parser with recursive descents, it should also be easily analyzed:

 f(g(x), y) 

What I can’t get. Here is an example:

 from pyparsing import Forward, Word, alphas, alphanums, nums, ZeroOrMore, Literal lparen = Literal("(") rparen = Literal(")") identifier = Word(alphas, alphanums + "_") integer = Word( nums ) functor = identifier # allow expression to be used recursively expression = Forward() arg = identifier | integer | expression args = arg + ZeroOrMore("," + arg) expression << functor + lparen + args + rparen print expression.parseString("f(x, y)") print expression.parseString("f(g(x), y)") 

And here is the conclusion:

 ['f', '(', 'x', ',', 'y', ')'] Traceback (most recent call last): File "tmp.py", line 14, in <module> print expression.parseString("f(g(x), y)") File "/usr/local/lib/python2.6/dist-packages/pyparsing-1.5.6-py2.6.egg/pyparsing.py", line 1032, in parseString raise exc pyparsing.ParseException: Expected ")" (at char 3), (line:1, col:4) 

Why does my parser interpret the inner expression functor as a standalone identifier?

+8
python parsing pyparsing
source share
3 answers

The arg definition must be organized with an element that starts from the other on the left, so it matches preferably:

 arg = expression | identifier | integer 
+4
source share

Good catch when determining that identifier masked expression in your arg definition. Here are some tips for your parser:

x + ZeroOrMore(',' + x) is a very common pattern in parsers, so pyparsing includes the helper method delimitedList , which allows you to replace this expression with delimitedList(x) . In fact, delimitedList does one more thing: it suppresses the separating commas (or another separator if specified using the optional delim argument), based on the notion that separators are useful in parsing time, but are just token interference when trying to sift then analyzed data. Therefore, you can rewrite args as args = delimitedList(arg) , and you only get the arguments in the list, without commas, to "step over".

You can use the Group class to create the actual structure in your processed tokens. This will create a nesting hierarchy for you without going through this list, looking for "(" and ") to tell you when you have reached the nesting level of the function:

  arg = Group(expression) | identifier | integer expression << functor + Group(lparen + args + rparen) 

Since your Group ed arguments are for you, you can optionally suppress parens, because, like separating commas, they do their job during parsing, but with the grouping of your tokens they are no longer needed:

 lparen = Literal("(").suppress() rparen = Literal(")").suppress() 

I assume that "h ()" is a valid function call, not args. You can allow args to be optional using Optional :

 expression << functor + Group(lparen + Optional(args) + rparen) 

Now you can parse "f (g (x), y, h ())".

Welcome to pyparsing!

+11
source share

Paul post really helped. Just to reference others, the same can be used to define for loops as follows (a simplified pseudo-parser is here to show the structure):

 sep = Literal(';') if_ = Keyword('if') then_ = Keyword('then') elif_ = Keyword('elif') end_ = Keyword('end') if_block = Forward() do_block = Forward() stmt = other | if_block stmts = OneOrMore(stmt +sep) case = Group(guard +then_ +stmts) cases = case +OneOrMore(elif_ +case) if_block << if_ +cases +end_ 
0
source share

All Articles