Pyparsing - where the order of tokens is unpredictable

I want to pull the type and number of letters from a piece of text where the letters can be in any order. There are other parses I'm working on, but it puzzled me a bit!

input -> result "abc" -> [['a',1], ['b',1],['c',1]] "bbbc" -> [['b',3],['c',1]] "cccaa" -> [['a',2],['c',3]] 

I could use a search or scan and repeat for every possible letter, but is there a clean way to do this?

This, as I understand it:

 from pyparsing import * def handleStuff(string, location, tokens): return [tokens[0][0], len(tokens[0])] stype = Word("abc").setParseAction(handleStuff) section = ZeroOrMore(stype("stype")) print section.parseString("abc").dump() print section.parseString("aabcc").dump() print section.parseString("bbaaa").dump() 
+7
python pyparsing
source share
5 answers

From your description, I was unclear whether it is possible to mix input characters like "ababc", since in all your test cases the letters were always grouped together. If the letters are always grouped together, you can use this piraffe code:

 def makeExpr(ch): expr = Word(ch).setParseAction(lambda tokens: [ch,len(tokens[0])]) return expr expr = Each([Optional(makeExpr(ch)) for ch in "abc"]) for t in tests: print t,expr.parseString(t).asList() 

Each construct takes care of the order matching, and Word (ch) handles 1-to-n repetition. The parsing action takes care of converting the processed tokens into tuples (character, number).

+6
source share

One solution:

 text = 'sufja srfjhvlasfjkhv lasjfvhslfjkv hlskjfvh slfkjvhslk' print([(x,text.count(x)) for x in set(text)]) 

No piraping involved, but it seems that this is too much.

+6
source share

I like the Lennart single line solution .

Alex mentions another great option if you are using 3.1

Another option: collections.defaultdict :

 >>> from collections import defaultdict >>> mydict = defaultdict(int) >>> for c in 'bbbc': ... mydict[c] += 1 ... >>> mydict defaultdict(<type 'int'>, {'c': 1, 'b': 3}) 
+3
source share

If you want to use the pure pyrase method, this seems to be correct:

 from pyparsing import * # lambda to define expressions def makeExpr(ch): expr = Literal(ch).setResultsName(ch, listAllMatches=True) return expr expr = OneOrMore(MatchFirst(makeExpr(c) for c in "abc")) expr.setParseAction(lambda tokens: [[a,len(b)] for a,b in tokens.items()]) tests = """\ abc bbbc cccaa """.splitlines() for t in tests: print t,expr.parseString(t).asList() 

Print

 abc [['a', 1], ['c', 1], ['b', 1]] bbbc [['c', 1], ['b', 3]] cccaa [['a', 2], ['c', 3]] 

But this starts to penetrate into the obscure area of ​​the code, as it relies on some of the more mysterious features of pyparsing. In general, I like the frequency counters that use defaultdict (have not tried Counter yet), since it is pretty clear what you are doing.

+2
source share

pyparsing apart - in Python 3.1, collections.Counter makes such counting tasks really easy. A good version of Counter for Python 2 can be found here .

+1
source share

All Articles