Get the first letter with the maximum appearance of the line

I would like to get the first letter with the maximum appearance of the string.

For instance:

"google" -> g "azerty" -> a "bbbaaa" -> b 

I already have working code using OrdererDict () to avoid automatic key permutation:

 from collections import OrderedDict sentence = "google" d = OrderedDict() for letter in sentence: if letter not in d.keys(): d[letter] = sentence.count(letter) print(max(d, key=d.get)) # g 

but I'm looking for a possible one-line or more elegant solution (if possible).

Note: I already tried to use Counter () , but it does not work, since the dict in python does not remember the order in which the keys were inserted.

eg

 from collections import Counter sentence = "bbbaaa" c = Counter(sentence) print(c.most_common()[0][0]) # have 50% chances of printing 'a' rather than 'b'. 

Bonus question: can anyone explain why OrderedDict () is not the default behavior in python?

+6
source share
5 answers

The documentation for collections.OrderedDict has an OrderedCounter recipe :

 In [5]: from collections import Counter, OrderedDict In [6]: class OrderedCounter(Counter, OrderedDict): ...: pass ...: In [7]: OrderedCounter("google").most_common()[0][0] Out[7]: 'g' 
+6
source

Probably not very fast, but single-line!

 >>> s = "aaabbbbc" >>> sorted(s, key=lambda c: (-s.count(c), s.index(c)))[0] 'b' 

Edit

Even shorter thanks to a comment by @Ohad Eytan:

 >>> min(s, key=lambda c: (-s.count(c), s.index(c))) 'b' 

Benchmark

The feeling is boring today, so I tested (using timeit ) the @Joohwan most_common_char() solution ( timeit ) test, @Blender most_common_char() solution and my own one liner solution (onelin using the min option). The quick fix was always the fastest: up to ~ 10x faster than onelin for long strings containing several different characters, and up to ~ 4x faster than odict for very short strings. For short lines or lines with small repeated characters, onlin hits the same (otherwise, vice versa). Here are the details (length = string length, # chars = number of different unicode characters for random selection for each char, mostcc = time to execute 10,000 times longer than odcc = how much longer odict compared to mostcc, onelin = how many longer oneline was compared to mostcc).

 Length #chars mostcc odict onelin 10 10: 0.08s 3.76x 1.61x 10 100: 0.10s 3.57x 1.27x 10 1000: 0.12s 3.12x 1.34x 100 10: 0.43s 1.96x 3.29x 100 100: 0.59s 2.16x 2.18x 100 1000: 0.80s 1.92x 1.72x 1000 10: 3.48s 1.56x 9.79x 1000 100: 3.44s 1.72x 6.43x 1000 1000: 6.55s 1.68x 3.30x 
+5
source

I know that you need a single liner, but what if you had to repeat this task many times or process really long sentences? I do not know the specific use case here, but it may be worth your time, given the complexity of the space and time of the algorithm.

In your solution, for example, you repeat the sentence many times more than necessary using sentence.count() , which takes O(n * number of unique characters) . After that, you again go through an ordered disk to find max (another operation O(number of unique characters) ).

In the decision we make, we ultimately need to define a new class (which violates your liner btw requirement) and create new objects with a lot of templates and functionality that you probably will not need every time you want to complete your task.

If you don't mind having a few more lines of code (again, I know this is not what the question is asking), we can build a reusable function that should only iterate through the line once and use constant and minimal space:

 from collections import defaultdict def most_common_char(sentence): if not sentence: return '' max_count = 1 max_char = sentence[-1] char_counts = defaultdict(int) char_counts[max_char] = 1 for i in xrange(len(sentence) - 2, -1, -1): char = sentence[i] char_counts[char] += 1 if char_counts[char] >= max_count: max_count = char_counts[char] max_char = char return max_char 

We track the character with the maximum number as , we go through the line and spit it out at the end of the iteration. Please note that we are iterating backwards, since you need the first letter (i.e. the last updated victory).

+3
source

You can use Counter() along with next() to find the first letter that matches the condition:

 >>> s = "google" >>> c = Counter(s) >>> next(x for x in s if c[x] == c.most_common(1)[0][1]) 'g' 
+2
source

You can also fix the problem that you describe at the end of your question about using a counter by getting a list sorted by various attributes: firstly, secondly, the lexicographic order:

 from collections import Counter sentence = "google" c = Counter(sentence) print(sorted(c.most_common(), key = lambda x: (-x[1], sentence.index(x[0])))) 

Output:

 => [('g', 2), ('o', 2), ('l', 1), ('e', 1)] 

Just for fun:

Golf Version:

 # If your sentence is s: print(sorted(collections.Counter(s).most_common(),key=lambda x:(-x[1],s.index(x[0])))) 
+1
source

All Articles