Python: for loop in index assignment

While working on the awesome book Programming Collective Intelligence, Toby Segaran, I came across some methods in index assignments that I am not completely familiar with.

Take this for example:

createkey='_'.join(sorted([str(wi) for wi in wordids])) 

or

 normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()]) 

All nested tuples in indexes confuse me a bit. What is actually assigned to these variables? I assumed .join acts as a string, but what about the last? If someone could explain the mechanics of these cycles, I would really appreciate it. I assume these are fairly common methods, but as a newbie in Python, I probably wonder. Thanks!

+4
source share
6 answers
 [str(wi) for wi in wordids] 

is > .

 a = [str(wi) for wi in wordids] 

coincides with

 a = [] for wi in wordids: a.append(str(wi)) 

So

 createkey='_'.join(sorted([str(wi) for wi in wordids])) 

creates a list of strings from each element in wordids , then sorts this list and wordids it to the large string, using _ as the delimiter.

As correctly noted, you can also use a generator expression that looks exactly like a list comprehension, but with parentheses instead of brackets. This avoids creating a list unless you need it later (with the exception of repeating it). And if you already have parentheses, as in this case with sorted(...) , you can just remove the parentheses.

However, in this special case, you will not get a performance gain (in fact, it will be about 10% slower, I dated it), because sorted() will have to create a list anyway, but it looks a little nicer:

 createkey='_'.join(sorted(str(wi) for wi in wordids)) 

 normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()]) 

linkscores over the elements of the linkscores dictionary, where each element is a key / value pair. It creates a list of key / l/maxscore , and then returns that list to the dictionary.

However, since Python 2.7, you can also use dict solutions :

 normalizedscores = {u:float(l)/maxscore for (u,l) in linkscores.items()} 

Here are some temporary data:

Python 3.2.2

 >>> import timeit >>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]") 61.37724242267409 >>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]") 66.01814811313774 

Python 2.7.2

 >>> import timeit >>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]") 58.01728623923137 >>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]") 60.58927580777687 
+15
source

Take the first one:

  • str(wi) for wi in wordids takes each element in wordids and converts it to a string.
  • sorted(...) sorts them (lexicographically).
  • '_'.join(...) combines the sorted identifiers of a word into one line with underscores between entries.

Now second:

 normalizedscores = dict([(u,float(1)/maxscore) for (u,l) in linkscores.items()]) 
  • linkscores - a dictionary (or a dictionary-like object).
  • for (u,l) in linkscores.items() over all the words in the dictionary, for each record, assigning a key and a value to u and l .
  • (u,float(1)/maxscore) is a tuple whose first element is u , and the second element is 1/maxscore (it seems to me that this could be a typo: float(l)/maxscore will make more sense - note the lower letter el instead one).
  • dict(...) builds a dictionary from the list of tuples, where the first element of each tuple is taken as a key, and the second is taken as a value.

In short, it makes a copy of the dictionary, storing the keys and dividing each value into maxscore .

+3
source

The latter is equivalent to:

 normalizedscores = {} for u, l in linkscores.items(): normalizedscores[u] = float(l) / maxscore 
+1
source
 [(u,float(1)/maxscore) for (u,l) in linkscores.items()] 

This creates a list by iterating over tuples in linkscores.items() and calculating (u, float(l)/maxscore) for each set.

 dict([this list]) 

creates a dict with entries from the list comprehension result - (u, float(l)/maxscore) for each element in linkscores .

As another example of creating a dict from a list of tuples:

 >>> l = [(1,2), (3,4), (5,6)] >>> d = dict(l) >>> d {1: 2, 3: 4, 5: 6} 
+1
source

Here is an example of the first ... example

 >>> wordids = [1,2,4,3,10,7] >>> createkey='_'.join(sorted([str(wi) for wi in wordids])) >>> print createkey 1_10_2_3_4_7 

It iterates through a list with a for loop, sorting the list, then concatenating all the sorted values ​​into a string, dividing the values ​​by "_"

+1
source

The weird looking business going on inside the brackets [] is called list comprehension, and it is basically a very concise way to build a list. myList = [str(wi) for wi in wordids] equivalent to:

 myList = [] for wi in wordids: myList.append(str(wi)) 

sorted() then sorts this list, and join() gives a string with these list items separated by underscores, for example: item1_item2_item3_...

The second assignment is more complicated / short, but here's what happens:

  • linkscores looks like a dictionary, and the items() method returns a list of (key, value) tuples from the dictionary. So, for (u,l) in linkscores.items() over this list.
  • For each of these tuples, we create a new tuple containing (u, float(l)/maxscore) and add it to the list. So this step basically changes your list (item, value) to a list of tuples (item, normalized value) .
  • The dict() function returns this back to the dictionary.

The overall result is to take all the values ​​in a dict and normalize them. There may be a simpler / more detailed way to do this, but this method has the advantage of looking cool. I prefer not to do crazy things with a list, because it hurts readability, so don’t feel bad if you don’t feel like writing things like that yourself!

+1
source

All Articles