Python puzzle code overview (spoiler)

Question

Python puzzle code overview (spoiler)

I worked on the issues presented in the Python Challenge . One of the problems requires sifting through the mess of characters and choosing the rarest characters.

My methodology was to read characters from a text file, store characters / occurrences as key / value pairs in a dictionary. Sort the dictionary by value and invert the dictionary, where the entry is the key and the character string is the value. Assuming that the rarest character occurs only once, I return values where the key of this inverted dictionary is one.

The input (funkymess.txt) is as follows:

%% $@ $ ^ _ #) ^) &! _ +] * @ & Amp; ^} @@ %% + $ &! [(_ @% +% $ * ^ @ $ ^ +] & #) *} {}}}] $ [%} @ [{@ # _ ^ {* ......

!

The code is as follows:

from operator import itemgetter characterDict = dict() #put the characters in a dictionary def putEncounteredCharactersInDictionary(lineStr): for character in lineStr: if character in characterDict: characterDict[character] = characterDict[character]+1 else: characterDict[character] = 1 #Sort the character dictionary def sortCharacterDictionary(characterDict): sortCharDict = dict() sortsortedDictionaryItems = sorted(characterDict.iteritems(),key = itemgetter(1)) for key, value in sortsortedDictionaryItems: sortCharDict[key] = value return sortCharDict #invert the sorted character dictionary def inverseSortedCharacterDictionary(sortedCharDict): inv_map = dict() for k, v in sortedCharDict.iteritems(): inv_map[v] = inv_map.get(v, []) inv_map[v].append(k) return inv_map f = open('/Users/Developer/funkymess.txt','r') for line in f: #print line processline = line.rstrip('\n') putEncounteredCharactersInDictionary(processline) f.close() sortedCharachterDictionary = sortCharacterDictionary(characterDict) #print sortedCharachterDictionary inversedSortedCharacterDictionary = inverseSortedCharacterDictionary(sortedCharachterDictionary) print inversedSortedCharacterDictionary[1]r

Can someone take a look and give me some guidance on whether I am here on the right path and, if possible, give some feedback on possible optimizations / best practices and possible refactoring both from the language and from the algorithmic point of view.

thanks

+4

python algorithm

sc_ray Nov 11 '10 at 16:30

source share

5 answers

You don’t even need such code, because Python already has a class that counts elements in iterable for you! The following does everything that you requested.

 from collections import Counter counter = Counter(open(<...>).read()) print min(counter, key=counter.get)

Explanation:

collections is a standard module in Python that contains some commonly used data structures. In particular, it contains Counter , which is a subclass of dict designed to calculate the frequency of material. It takes an iteration and counts all the characters in it.

Now, as you know, in Python, strings are iterable, and their elements are single. Thus, we can open save the file, read all its contents at once and pass this large line to Counter . This makes a dictaphone object that displays characters at their frequencies.

Finally, we want to find the least frequent charater, given this dictionary of their frequencies. In other words, we need a minimal Counter element sorted by its value in the dictionary. Python has a built-in function to accept a minimum of things, naturally called min . If you want to sort the data by something, you can pass it an optional key argument and sort the list by the key this list. In this case, we will ask min find the minimum element sorted by counter.get ; in other words, we sort by its frequency!

+4

katrielalex Nov 11 '10 at 16:38

source share

Thus, too much code.

 [k for k, v in characterdict.iteritems() if v = min(characterdict.items(), key=operator.itemgetter(1))[0]]

Optimize as desired (for example, first save a minimum in another variable).

+2

Ignacio Vazquez-Abrams Nov 11 '10 at 16:33

source share

Here is the code I used to solve this puzzle:

 comment = open('comment.txt').read() for c in sorted(set(comment)): print ' %-3s %6d' % (repr(c)[1:-1], comment.count(c))

It sorts the characters alphabetically, not by frequency, but the rarest characters are very easy to select from the output.

If I need frequency sorting, I would use collections. Counter, as suggested by katrielalex (if I remembered its existence) or

 from collections import defaultdict comment = open('comment.txt').read() counts = defaultdict(int) for c in comment: counts[c] += 1 for c in sorted(counts, key=counts.get): print ' %-3s %6d' % (repr(c)[1:-1], counts[c])

+1

Marius gedminas Nov 11 '10 at 16:56

source share

Another way (not very compact) to accomplish your task:

 text = """% $@ $^_#)^)&!_+]!*@&^}@@%%+$&[( _@ %+%$*^@$^!+]!&#)*}{}}!}""" chars = set(text) L = [[c, text.count(c)] for c in chars] L.sort(key=lambda sublist: sublist[1]) >>> L [('(', 1), ('[', 1), ('{', 1), ('#', 2), (']', 2), (')', 3), ('*', 3), ('_', 3), ('&', 4), ('+', 4), ('!', 5), ('%', 5), ('$', 5), ('}', 5), ('^', 5), ('@', 6)] >>>

0

Nick dandoulakis Nov 11 '10 at 17:05

source share

John kugelman · Accepted Answer · 2010-11-11T17:10:47+0000

Refactoring: Walkthrough

I want to guide you through the refactoring process. Learning to program is not only knowing the end result that you usually get when you ask a question about stack overflow. It's about how to get to this answer. When people publish short, tight answers to a similar question, it is not always obvious how they came to their decisions.

So, let's do some refactoring and see what we can do to simplify your code. We will rewrite, delete, rename and reorder the code until no further improvements are made.

Simplify Algorithms

Python should not be so detailed. This is usually the smell of code when you have explicit loops working on lists and dicts in Python, instead of using list methods and functions that work with containers in general.

Use defaultdict to store the number of characters

A defaultdict(int) will generate records when they are available if they do not exist. This eliminates the if / else branch when counting characters.

 from collections import defaultdict characterDict = defaultdict(int) def putEncounteredCharactersInDictionary(lineStr): for character in lineStr: characterDict[character] += 1

Sort dicts

Dictionaries do not guarantee any order for their keys. You cannot assume that the elements are stored in the same order in which you insert them. Thus, sorting the dict records and then returning them back to another dict simply spun them back.

This means that your function basically does not work. After sorting the items, you will need to save them as a list of tuples in order to maintain the sort order. Then, by removing this code, we will reduce this method to one line.

 def sortCharacterDictionary(characterDict): return sorted(characterDict.iteritems(), key=itemgetter(1))

Inverting dicts

Given the previous comment, you will no longer have a dictator after sorting. But assuming you did this, this function is one of those cases where an explicit loop is not recommended. In Python, always think about how you can work on collections at the same time, and not just one item at a time.

 def inverseSortedCharacterDictionary(sortedCharDict): return dict((v, k) for k, v in sortedCharDict.iteritems())

All in one line, we (1) iterate over the key / value pairs in dict; (2) switch them and create inverted values / key tuples; (3) create a dict from these inverted tuples.

Comment and name wisely

The names of your methods are long and descriptive. There is no need to repeat the same information in the comments. Use comments only when your code is not self-describing, for example, when you have a complex algorithm or an unusual construction that is not immediately obvious.

At the beginning of naming, your names are unnecessarily long. I will stick to much less descriptive names, as well as make them more universal. Instead of inverseSortedCharacterDictionary try just invertedDict . What all this does is, it inverts the dict. It really doesn't matter if it passed the sorted dict character or any other type of dict.

As a rule, try using the most common names so that your methods and variables can be as universal as possible. More general remedies are more reusable.

 characters = defaultdict(int) def countCharacters(string): for ch in string: characters[ch] += 1 def sortedCharacters(characters): return sorted(characters.iteritems(), key=itemgetter(1)) def invertedDict(d): return dict((v, k) for k, v in d.iteritems())

Reduce volume

Using temporary variables and helper methods is good programming practice, and I welcome you for this in your program. However, now that we have them simple enough so that each of them is only one or two lines, we probably do not even need them.

Here is your program object after changing the functions as described above:

 f = open('funkymess.txt', 'r') for line in f: countCharacters(line.rstrip('\n')) f.close() print sortedCharacters(characters)[0]

And then let's just go ahead and build in these helper methods, since they are so simple. Here is the final program after refactoring:

Final program

 #!/usr/bin/env python from operator import itemgetter from collections import defaultdict characters = defaultdict(int) f = open('funkymess.txt','r') for line in f: for ch in line.rstrip('\n'): characters[ch] += 1 f.close() print sorted(characters.iteritems(), key=itemgetter(1))[0]

Python puzzle code overview (spoiler)

Refactoring: Walkthrough

Simplify Algorithms

Use defaultdict to store the number of characters

Sort dicts

Inverting dicts

Comment and name wisely

Reduce volume

Final program

More articles: