Good catch when determining that identifier masked expression in your arg definition. Here are some tips for your parser:
x + ZeroOrMore(',' + x) is a very common pattern in parsers, so pyparsing includes the helper method delimitedList , which allows you to replace this expression with delimitedList(x) . In fact, delimitedList does one more thing: it suppresses the separating commas (or another separator if specified using the optional delim argument), based on the notion that separators are useful in parsing time, but are just token interference when trying to sift then analyzed data. Therefore, you can rewrite args as args = delimitedList(arg) , and you only get the arguments in the list, without commas, to "step over".
You can use the Group class to create the actual structure in your processed tokens. This will create a nesting hierarchy for you without going through this list, looking for "(" and ") to tell you when you have reached the nesting level of the function:
arg = Group(expression) | identifier | integer expression << functor + Group(lparen + args + rparen)
Since your Group ed arguments are for you, you can optionally suppress parens, because, like separating commas, they do their job during parsing, but with the grouping of your tokens they are no longer needed:
lparen = Literal("(").suppress() rparen = Literal(")").suppress()
I assume that "h ()" is a valid function call, not args. You can allow args to be optional using Optional :
expression << functor + Group(lparen + Optional(args) + rparen)
Now you can parse "f (g (x), y, h ())".
Welcome to pyparsing!
Paulmcg
source share