Pyparsing greedy expressions

Question

Pyparsing greedy expressions

I am trying to split a string as aaa:bbb(123)tokens using Pyparsing.

I can do this with regex, but I need to do this through Pyparsing.

C resolution would look like this:

>>> import re
>>> string = 'aaa:bbb(123)'
>>> regex = '(\S+):(\S+)\((\d+)\)'
>>> re.match(regex, string).groups()
('aaa', 'bbb', '123')

It is clear and simple enough. The key point here is \S+what it means "everything but spaces."

Now I will try to do this with Pyparsing:

>>> from pyparsing import Word, Suppress, nums, printables
>>> expr = (
...     Word(printables, excludeChars=':')
...     + Suppress(':')
...     + Word(printables, excludeChars='(')
...     + Suppress('(')
...     + Word(nums)
...     + Suppress(')')
... )
>>> expr.parseString(string).asList()
['aaa', 'bbb', '123']

Ok, we got the same result, but it doesn’t look very good. We set excludeCharsPyparsing expressions to stop where we need them, but that doesn't seem reliable. If we have "excluded" characters in the original string, then the same regular expression will work fine:

>>> string = 'a:aa:b(bb(123)'
>>> re.match(regex, string).groups()
('a:aa', 'b(bb', '123')

while the pyparsing exception will obviously break:

>>> expr.parseString(string).asList()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/long/path/to/pyparsing.py", line 1111, in parseString
    raise exc
ParseException: Expected W:(0123...) (at char 7), (line:1, col:8)

, , Pyparsing?

+4

python pyparsing

oblalex 22 . '14 20:26

2

blaze · Answer 1 · 2014-11-23T00:02:55+0000

:

from pyparsing import Word, Suppress, Regex, nums, printables

expr = (
     Word(printables, excludeChars=':')
     + Suppress(':')
     + Regex(r'\S+[^\(](?=\()')
     + Suppress('(')
     + Word(nums)
     + Suppress(')')
 )

PaulMcG · Answer 2 · 2014-11-23T02:20:49+0000

, pyparsing - , - .

backtracking, Regex, re:

expr = Regex(r"(\S+):(\S+)\((\d+)\)")
print expr.parseString(string).dump()

['aaa:b(bb(123)']

, . , :

expr = Regex(r"(?P<field1>\S+):(?P<field2>\S+)\((?P<field3>\d+)\)")
print expr.parseString(string).dump()

['aaa:b(bb(123)']
- field1: aaa
- field2: b(bb
- field3: 123

, arg Regex, re, .

Pyparsing greedy expressions

More articles: