Pyparsing regex as word

Question

Pyparsing regex as word

I am creating a parser to perform simple actions on objects identified by dotted notation , something like this:

DISABLE ALL; ENABLE A.1 B.1.1 C

but in DISABLE ALL keyword is instead matched as 3 Regex(r'[a-zA-Z]') => 'A', 'L', 'L' I use to match the arguments.

How to create Word using regular expression? AFAIK I cannot get A.1.1 using Word

see example below

 import pyparsing as pp def toggle_item_action(s, loc, tokens): 'enable / disable a sequence of items' action = True if tokens[0].lower() == "enable" else False for token in tokens[1:]: print "it[%s].active = %s" % (token, action) def toggle_all_items_action(s, loc, tokens): 'enable / disable ALL items' action = True if tokens[0].lower() == "enable" else False print "it.enable_all(%s)" % action expr_separator = pp.Suppress(';') #match A area = pp.Regex(r'[a-zA-Z]') #match A.1 category = pp.Regex(r'[a-zA-Z]\.\d{1,2}') #match A.1.1 criteria = pp.Regex(r'[a-zA-Z]\.\d{1,2}\.\d{1,2}') #match any of the above item = area ^ category ^ criteria #keyword to perform action on ALL items all_ = pp.CaselessLiteral("all") #actions enable = pp.CaselessKeyword('enable') disable = pp.CaselessKeyword('disable') toggle = enable | disable #toggle item expression toggle_item = (toggle + item + pp.ZeroOrMore(item) ).setParseAction(toggle_item_action) #toggle ALL items expression toggle_all_items = (toggle + all_).setParseAction(toggle_all_items_action) #swapping order to `toggle_all_items ^ toggle_item` works #but seems to weak to me and error prone for future maintenance expr = toggle_item ^ toggle_all_items #expr = toggle_all_items ^ toggle_item more = expr + pp.ZeroOrMore(expr_separator + expr) more.parseString(""" ENABLE A.1 B.1.1; DISABLE ALL """, parseAll=True)

+4

python pyparsing

neurino Jun 09 '11 at 10:21

source share

1 answer

Paulmcg · Accepted Answer · 2011-06-09T12:34:45+0000

This is problem?

 #match any of the above item = area ^ category ^ criteria #keyword to perform action on ALL items all_ = pp.CaselessLiteral("all")

Must be:

 #keyword to perform action on ALL items all_ = pp.CaselessLiteral("all") #match any of the above item = area ^ category ^ criteria ^ all_

EDIT - if you are interested ...

Your regular expressions are so similar, I thought I would see how it would look in order to combine them into one. Here is a snippet to parse your three-point notations using a single regular expression, and then using parsing to figure out what type you got:

 import pyparsing as pp dotted_notation = pp.Regex(r'[a-zA-Z](\.\d{1,2}(\.\d{1,2})?)?') def name_notation_type(tokens): name = { 0 : "area", 1 : "category", 2 : "criteria"}[tokens[0].count('.')] # assign results name to results - tokens[name] = tokens[0] dotted_notation.setParseAction(name_notation_type) # test each individually tests = "A A.1 A.2.2".split() for t in tests: print t val = dotted_notation.parseString(t) print val.dump() print val[0], 'is a', val.getName() print # test all at once tests = "A A.1 A.2.2" val = pp.OneOrMore(dotted_notation).parseString(tests) print val.dump()

Print

 A ['A'] - area: A A is a area A.1 ['A.1'] - category: A.1 A.1 is a category A.2.2 ['A.2.2'] - criteria: A.2.2 A.2.2 is a criteria ['A', 'A.1', 'A.2.2'] - area: A - category: A.1 - criteria: A.2.2

EDIT2 - I see the original problem ...

What you are mistaken is pipeting an implicit missing space. Pyparsing will skip spaces between specific tokens, but the opposite is not true - pyparsing does not require spaces between individual parser expressions. Thus, in your version of all_-less, "ALL" looks like 3 areas: "A", "L" and "L". This applies not only to Regex, but also to any pyparsing class. See if the PyEping WordEnd class can be useful to provide this.

EDIT3 - Then maybe something like this ...

 toggle_item = (toggle + pp.OneOrMore(item)).setParseAction(toggle_item_action) toggle_all = (toggle + all_).setParseAction(toggle_all_action) toggle_directive = toggle_all | toggle_item

As your commands are formatted, you must first make the analyzer see if ALL switches to ALL before looking for individual areas, etc. If you need to support something that can read "ENABLE A.1 ALL" then use a negative result for item : item = ~all_ + (area ^ etc...) . (Note also that I replaced item + pp.ZeroOrMore(item) with just pp.OneOrMore(item) .)

Pyparsing regex as word

More articles: