Pyparsing - where the order of tokens is unpredictable

Question

Pyparsing - where the order of tokens is unpredictable

I want to pull the type and number of letters from a piece of text where the letters can be in any order. There are other parses I'm working on, but it puzzled me a bit!

input -> result "abc" -> [['a',1], ['b',1],['c',1]] "bbbc" -> [['b',3],['c',1]] "cccaa" -> [['a',2],['c',3]]

I could use a search or scan and repeat for every possible letter, but is there a clean way to do this?

This, as I understand it:

 from pyparsing import * def handleStuff(string, location, tokens): return [tokens[0][0], len(tokens[0])] stype = Word("abc").setParseAction(handleStuff) section = ZeroOrMore(stype("stype")) print section.parseString("abc").dump() print section.parseString("aabcc").dump() print section.parseString("bbaaa").dump()

+7

python pyparsing

PhoebeB Jan 25 '10 at 18:01

source share

5 answers

One solution:

 text = 'sufja srfjhvlasfjkhv lasjfvhslfjkv hlskjfvh slfkjvhslk' print([(x,text.count(x)) for x in set(text)])

No piraping involved, but it seems that this is too much.

+6

Lennart Regebro Jan 25 '10 at 18:38

source share

I like the Lennart single line solution .

Alex mentions another great option if you are using 3.1

Another option: collections.defaultdict :

 >>> from collections import defaultdict >>> mydict = defaultdict(int) >>> for c in 'bbbc': ... mydict[c] += 1 ... >>> mydict defaultdict(<type 'int'>, {'c': 1, 'b': 3})

+3

bernie Jan 25 '10 at 19:00

source share

If you want to use the pure pyrase method, this seems to be correct:

 from pyparsing import * # lambda to define expressions def makeExpr(ch): expr = Literal(ch).setResultsName(ch, listAllMatches=True) return expr expr = OneOrMore(MatchFirst(makeExpr(c) for c in "abc")) expr.setParseAction(lambda tokens: [[a,len(b)] for a,b in tokens.items()]) tests = """\ abc bbbc cccaa """.splitlines() for t in tests: print t,expr.parseString(t).asList()

Print

 abc [['a', 1], ['c', 1], ['b', 1]] bbbc [['c', 1], ['b', 3]] cccaa [['a', 2], ['c', 3]]

But this starts to penetrate into the obscure area of the code, as it relies on some of the more mysterious features of pyparsing. In general, I like the frequency counters that use defaultdict (have not tried Counter yet), since it is pretty clear what you are doing.

+2

Paulmcg Jan 26 '10 at 3:08

source share

pyparsing apart - in Python 3.1, collections.Counter makes such counting tasks really easy. A good version of Counter for Python 2 can be found here .

+1

Alex martelli Jan 25 '10 at 18:51

source share

Paulmcg · Accepted Answer · 2010-01-26T03:15:28+0000

From your description, I was unclear whether it is possible to mix input characters like "ababc", since in all your test cases the letters were always grouped together. If the letters are always grouped together, you can use this piraffe code:

 def makeExpr(ch): expr = Word(ch).setParseAction(lambda tokens: [ch,len(tokens[0])]) return expr expr = Each([Optional(makeExpr(ch)) for ch in "abc"]) for t in tests: print t,expr.parseString(t).asList()

Each construct takes care of the order matching, and Word (ch) handles 1-to-n repetition. The parsing action takes care of converting the processed tokens into tuples (character, number).

Pyparsing - where the order of tokens is unpredictable

More articles: