Should Python use a generator for this case?

Question

Should Python use a generator for this case?

I have a list of almost 2k dictionaries inside it. And I use this list several times. For instance:

c = myClass() c.create(source) # where source is a text of approximately 50k chars # this method creates the list that has approximately 2k dictionaries item = c.get(15012) # now, this one loops thru the list to find an item # whenever the condition is matched, the for loop is broken and the value is returned item2 = c.prevItem(item) # this one also loops thru the list by reversing it and bringing the next item

Now imagine this scenario when I use the same list again and again. Since the list is large, I would like to use a generator, but as I understand it, generators should be recreated when they drop StopIteration. So basically, in this scenario, is it convenient to use a generator or is there a more efficient way in terms of speed?

+4

performance python generator

Shaokan Aug 14 '11 at 2:54

source share

6 answers

Owen · Answer 1 · 2011-08-14T03:19:07+0000

It seems to me that you should decide what you should do:

1) Save the values, so you do not need to recount them, but use more space for this.

2) Recount them every time, but keep them in space because you do not need to store them.

If you think about it, no matter which generator / list / what you use, one of these two things should happen. And I don’t think there is a simple solid rule to say which is better. (Personally, I would say, choose one and not look back. You have your whole life ahead.)

agf · Answer 2 · 2011-08-14T03:21:41+0000

If you often get an element with a known offset from the previously received element, you need to change .get to return not only the element, but also the position in the list. Then you can implement prevItem as:

 def previtem(self, pos): return self.itemlist[pos - 1] item, pos = c.get(itemnum) item2 = c.prevItem(pos)

If instead you perform some operation on item to get a new itemnum , you should save them in a dict instead of list . So get is just a dictionary lookup (much faster than a list lookup):

 def get(self, itemnum): return self.premade_dict[itemnum]

So, one way or another, you can replace some searches with cheaper operations.

Jürgen strobel · Answer 3 · 2011-08-14T03:20:03+0000

Depends on how you want to use the generator. Generators are only good at executing code when really needed. It seems your for loop with a break already does this.

You can change your class interface.

 def getItems(cond): # find item, remember index yield item # find previous item, possibly much more efficient with the index yield previtem

Now, by calling the getItems () method, you can use the returned generator for 1 or 2 elements, and only as much code will be executed as necessary.

Profane · Answer 4 · 2011-08-14T04:04:13+0000

A list of two thousand dictionaries is quite normal. I suppose a typical website administrator has a lot of such lists. If you rarely have to deal with such problems, you might be fine with an ad hoc solution - maybe you should consider a dictionary of dictionaries so you don't have to iterate over each key every time. But a more common way to address this data structure from what I am collecting is to use a database. Each of your dictionaries may have some key (ideally, this is a condition that you check in your loop). The database can be instructed to index the data for this key, and if you look at the work that it does to get the dictionary you need, you may be surprised if you find almost no answer - it pretty much cuts the deck to which you asked, so to speak (although to perform the index you need to do some work, which is something like a sort operation).

Python offers many great ways to map code to databases of all kinds. Check out the powerful but sophisticated sqlalchemy built-in module of the sqlite3 std library or join the experiment with mongoengine and nosql databases. (Of course, there are many others, but you can easily find here another post with a general overview). Good luck.

Roshan mathews · Answer 5 · 2011-08-14T04:18:23+0000

You can try this subclass of OrderedDict . My previous post was wrong (mentioned below):

 from collections import OrderedDict class MyOrderedDict(OrderedDict): def index(self, key): if key not in self.keys(): raise KeyError return list(d.keys()).index(key) def prev(self, key): idx = self.index(key) - 1 if idx < 0: raise IndexError return list(d.keys())[idx] def next(self, key): _list = list(d.keys()) idx = self.index(key) if idx > len(_list): raise IndexError return _list[idx+1] # >>> d = MyOrderedDict(((3, 'Three'), (2, 'Two'), (4, 'Four'), (1, 'One'))) # >>> d.index(3) # 0 # >>> d.index(2) # 1 # >>> d.prev(2) # 3 # >>> d.prev(3) # Traceback (most recent call last): # File "<stdin>", line 1, in <module> # File "<stdin>", line 9, in prev # IndexError # >>> d.next(4) # 1 # >>> d.next(1) # Traceback (most recent call last): # File "<stdin>", line 1, in <module> # File "<stdin>", line 16, in next # IndexError: list index out of range

Edit - as stated in @agf below, this is not true.

~~You are looking for a quick way to get an element from myClass , so you should use a dictionary.~~ ~~But at the same time, you want the data to be in some order so that you can make prevItem on it.~~ ~~Why don't you save your data in collections.OrderedDict , added in Python 2.7, 3.1.~~ ~~ref~~

Jochen ritzel · Answer 6 · 2011-08-14T11:05:54+0000

You have to use a list because you can do one trivial optimization with it: Sort it by the attribute you are looking for (in .get ) and do a binary search.

In a list of 2000 items, the average number of comparisons decreases from 1000 to 10! Getting the previous (and next) element becomes trivial too.

See bisect module for bisection algorithm.

Should Python use a generator for this case?

More articles: