Python: for loop in index assignment

Question

Python: for loop in index assignment

While working on the awesome book Programming Collective Intelligence, Toby Segaran, I came across some methods in index assignments that I am not completely familiar with.

Take this for example:

createkey='_'.join(sorted([str(wi) for wi in wordids]))

or

 normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()])

All nested tuples in indexes confuse me a bit. What is actually assigned to these variables? I assumed .join acts as a string, but what about the last? If someone could explain the mechanics of these cycles, I would really appreciate it. I assume these are fairly common methods, but as a newbie in Python, I probably wonder. Thanks!

+4

python dictionary variable-assignment indexing

DeaconDesperado Oct 14 '11 at 14:08

source share

6 answers

Take the first one:

str(wi) for wi in wordids takes each element in wordids and converts it to a string.
sorted(...) sorts them (lexicographically).
'_'.join(...) combines the sorted identifiers of a word into one line with underscores between entries.

Now second:

 normalizedscores = dict([(u,float(1)/maxscore) for (u,l) in linkscores.items()])

linkscores - a dictionary (or a dictionary-like object).
for (u,l) in linkscores.items() over all the words in the dictionary, for each record, assigning a key and a value to u and l .
(u,float(1)/maxscore) is a tuple whose first element is u , and the second element is 1/maxscore (it seems to me that this could be a typo: float(l)/maxscore will make more sense - note the lower letter el instead one).
dict(...) builds a dictionary from the list of tuples, where the first element of each tuple is taken as a key, and the second is taken as a value.

In short, it makes a copy of the dictionary, storing the keys and dividing each value into maxscore .

+3

NPE Oct 14 '11 at 14:12

source share

The latter is equivalent to:

 normalizedscores = {} for u, l in linkscores.items(): normalizedscores[u] = float(l) / maxscore

+1

eumiro Oct 14 '11 at 14:11

source share

 [(u,float(1)/maxscore) for (u,l) in linkscores.items()]

This creates a list by iterating over tuples in linkscores.items() and calculating (u, float(l)/maxscore) for each set.

 dict([this list])

creates a dict with entries from the list comprehension result - (u, float(l)/maxscore) for each element in linkscores .

As another example of creating a dict from a list of tuples:

 >>> l = [(1,2), (3,4), (5,6)] >>> d = dict(l) >>> d {1: 2, 3: 4, 5: 6}

+1

matt b Oct 14 '11 at 14:12

source share

Here is an example of the first ... example

 >>> wordids = [1,2,4,3,10,7] >>> createkey='_'.join(sorted([str(wi) for wi in wordids])) >>> print createkey 1_10_2_3_4_7

It iterates through a list with a for loop, sorting the list, then concatenating all the sorted values into a string, dividing the values by "_"

+1

Snaxib Oct 14 '11 at 14:16

source share

The weird looking business going on inside the brackets [] is called list comprehension, and it is basically a very concise way to build a list. myList = [str(wi) for wi in wordids] equivalent to:

 myList = [] for wi in wordids: myList.append(str(wi))

sorted() then sorts this list, and join() gives a string with these list items separated by underscores, for example: item1_item2_item3_...

The second assignment is more complicated / short, but here's what happens:

linkscores looks like a dictionary, and the items() method returns a list of (key, value) tuples from the dictionary. So, for (u,l) in linkscores.items() over this list.
For each of these tuples, we create a new tuple containing (u, float(l)/maxscore) and add it to the list. So this step basically changes your list (item, value) to a list of tuples (item, normalized value) .
The dict() function returns this back to the dictionary.

The overall result is to take all the values in a dict and normalize them. There may be a simpler / more detailed way to do this, but this method has the advantage of looking cool. I prefer not to do crazy things with a list, because it hurts readability, so don’t feel bad if you don’t feel like writing things like that yourself!

+1

andronikus Oct 14 '11 at 14:22

source share

Tim pietzcker · Accepted Answer · 2011-10-14T14:14:54+0000

 [str(wi) for wi in wordids]

is > .

 a = [str(wi) for wi in wordids]

coincides with

 a = [] for wi in wordids: a.append(str(wi))

So

 createkey='_'.join(sorted([str(wi) for wi in wordids]))

creates a list of strings from each element in wordids , then sorts this list and wordids it to the large string, using _ as the delimiter.

As correctly noted, you can also use a generator expression that looks exactly like a list comprehension, but with parentheses instead of brackets. This avoids creating a list unless you need it later (with the exception of repeating it). And if you already have parentheses, as in this case with sorted(...) , you can just remove the parentheses.

However, in this special case, you will not get a performance gain (in fact, it will be about 10% slower, I dated it), because sorted() will have to create a list anyway, but it looks a little nicer:

 createkey='_'.join(sorted(str(wi) for wi in wordids))

 normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()])

linkscores over the elements of the linkscores dictionary, where each element is a key / value pair. It creates a list of key / l/maxscore , and then returns that list to the dictionary.

However, since Python 2.7, you can also use dict solutions :

 normalizedscores = {u:float(l)/maxscore for (u,l) in linkscores.items()}

Here are some temporary data:

Python 3.2.2

 >>> import timeit >>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]") 61.37724242267409 >>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]") 66.01814811313774

Python 2.7.2

 >>> import timeit >>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]") 58.01728623923137 >>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]") 60.58927580777687

Python: for loop in index assignment

More articles: