Loop in Python: do something before the first iteration

I want to optimize.

A simple solution

connection = get_db_connection() for item in my_iterator: push_item_to_db(item, connection) 

Minus:

get_db_connection() slow. If my_iterator empty, then I want to avoid calling it.

if not solution

 connection = None for item in my_iterator: if connection is None: connection = get_db_connection() push_item_to_db(item, connection) 

Minus:

If my_iterator has 100 thousand elements, then if connection is None is called 100k times (although this is necessary only once). I want to avoid this.

The perfect solution ...

  • do not call get_db_connection() if the iterator is empty
  • do not call if connection is None: useless for each iteration.

Any idea?

+6
source share
5 answers

You can do something like:

 connection = None for item in my_iterator: if connection is None: connection = get_db_connection() push_item_to_db(item, connection) 

A simple solution. No need to convince him. Even with 100k operations, x is None is just a comparison of links with a single Python opcode. You really don't need to optimize this compared to the full tcp roundtrip + disk record that happens on every insert.

+5
source

I'm not an expert in Python, but I would do something like this:

 def put_items_to_database (iterator): try: item = next(iterator) # We connect to the database only after we # know there at least one element in the collection connection = get_db_connection() while True: push_item_to_db(item, connection) item = next(iterator) except StopIteration: pass 

It is true that performance is tied to the database here. However, the question is to find a way to avoid unnecessary work, and the above is the main way to precisely control what happens during the iteration.

Other solutions are β€œsimple”, in some ways, but, on the other hand, I think this is more explicit and follows the principle of least surprise.

+2
source
 for item in my_iterator: # First item (if any) connection = get_db_connection() push_item_to_db(item, connection) for item in my_iterator: # Next items push_item_to_db(item, connection) 
+2
source

Solution 1

This works without a while True .

 try: next(my_iterator) connection = get_db_connection() push_item_to_db(item, connection) except StopIteration: pass for item in my_iterator: push_item_to_db(item, connection) 

Decision 2

If you know that this iterator never returns None (or any other unique object), you can use the default value next() :

 if next(my_iterator, None) is not None: connection = get_db_connection() push_item_to_db(item, connection) for item in my_iterator: push_item_to_db(item, connection) 

Decision 3

If you cannot guarantee a value that is not returned by the iterator, you can use the watch.

 sentinel = object() if next(my_iterator, sentinel) is not sentinel: connection = get_db_connection() push_item_to_db(item, connection) for item in my_iterator: push_item_to_db(item, connection) 

Decision 4

Using itertools.chain() :

 from itertools import chain for first_item in my_iterator: connection = get_db_connection() for item in chain([first_item], my_iterator): push_item_to_db(item, connection) 
+1
source

You can check the number of iterators in front of the entire code section.

 if (len(my_iterator)>0): connection = get_db_connection() for item in my_iterator: push_item_to_db(item, connection) 
-one
source

All Articles