Disadvantages of the pen generator in its consumer

This is a continuation of Handle the exception created in the generator and discusses a more general problem.

I have a function that reads data in different formats. All formats are oriented to a line or record, and for each format there is a special parsing function implemented as a generator. Thus, the main read function receives an input and a generator that reads the corresponding format from the input and returns the records to the main function:

def read(stream, parsefunc): for record in parsefunc(stream): do_stuff(record) 

where parsefunc is something like:

 def parsefunc(stream): while not eof(stream): rec = read_record(stream) do some stuff yield rec 

The problem I am facing is that while parsefunc can throw an exception (for example, when reading from a stream), it does not know how to handle it. The function responsible for handling exceptions is the main function of read . Please note that exceptions are based on the record, therefore, even if one record fails, the generator must continue its work and return the records until the entire stream is exhausted.

In the previous question, I tried putting next(parsefunc) in a try block, but as it turned out, this would not work. So I have to add try-except to parsefunc itself, and then somehow deliver the exceptions to the consumer:

 def parsefunc(stream): while not eof(stream): try: rec = read_record() yield rec except Exception as e: ????? 

I'm pretty reluctant to do this because

  • It makes no sense to use try in a function that is not intended to handle any exceptions.
  • I don’t understand how to pass exceptions to the consumption function
  • there will be many formats and many parsefunc , I do not want to clutter them with too much auxiliary code.

Any suggestions for better architecture?

Note for googlers: in addition to the top answer, look at senderle and Jon's posts - very smart and insightful stuff.

+27
python generator exception-handling
Jul 06 2018-12-12T00:
source share
8 answers

You can return a tuple of entries and exceptions to parsefunc and let the user function decide what to do with the exception:

 import random def get_record(line): num = random.randint(0, 3) if num == 3: raise Exception("3 means danger") return line def parsefunc(stream): for line in stream: try: rec = get_record(line) except Exception as e: yield (None, e) else: yield (rec, None) if __name__ == '__main__': with open('temp.txt') as f: for rec, e in parsefunc(f): if e: print "Got an exception %s" % e else: print "Got a record %s" % rec 
+15
Jul 07 '12 at 11:20
source share

Thinking more deeply about what will happen in a more complex case, the view justifies the choice of Python, avoiding bubble exceptions from the generator.

If I had an I / O error from a stream object, the probability of just recovering and continuing reading would be if the structures local to the reset generator were somehow low. I would have to put up with the reading process in order to continue: skip garbage, discard partial data, reset an incorrect internal tracking structure, etc.

Only the generator has enough context to do it right. Even if you could keep the context of the generator with an external block, the exceptions would completely violate the Law of Demeter. All the important information that the surrounding block should reset and move on is in the local variables of the generator function! And receiving or transmitting this information, although possible, is disgusting.

The resulting exception will almost always be selected after cleaning, in which case the reader generator will already have an internal exception block. Trying very hard to maintain this cleanliness in a simple case with the brain, just so that it crashes in almost any realistic context, would be stupid. So just try in the generator, you still need the body of the except block in any difficult case.

It would be nice if the exceptional conditions could look like exceptions, but not like return values. Therefore, I would add an intermediate adapter to allow this: the generator throws either data or exceptions, and the adapter could raise the exception again, if applicable. The adapter must be named first in the for loop, so we have the opportunity to catch it in the loop and clear it before continuing, or exit the loop to catch it and abandon the process. And we need to put some kind of lame shell around the installation to indicate that the tricks are on, and make the adapter receive a call if the function adapts.

Thus, each layer represents errors with which it has a context for processing, due to the fact that the adapter is a tiny bit intrusive (and, possibly, also easy to forget).

So, we would have:

 def read(stream, parsefunc): try: for source in frozen(parsefunc(stream)): try: record = source.thaw() do_stuff(record) except Exception, e: log_error(e) if not is_recoverable(e): raise recover() except Exception, e: properly_give_up() wrap_up() 

(If two try blocks are optional.)

The adapter looks like this:

 class Frozen(object): def __init__(self, item): self.value = item def thaw(self): if isinstance(value, Exception): raise value return value def frozen(generator): for item in generator: yield Frozen(item) 

And parsefunc looks like this:

 def parsefunc(stream): while not eof(stream): try: rec = read_record(stream) do_some_stuff() yield rec except Exception, e: properly_skip_record_or_prepare_retry() yield e 

To make it harder to forget the adapter, we could also freeze from function to decorator on parsefunc.

 def frozen_results(func): def freezer(__func = func, *args, **kw): for item in __func(*args, **kw): yield Frozen(item) return freezer 

In this case, we will declare:

 @frozen_results def parsefunc(stream): ... 

And we obviously did not bother to declare frozen or wrap it around calling parsefunc .

+13
Sep 29 '12 at 19:57
source share

Without knowing more about the system, it’s hard for me to say which approach will work best. However, one option that no one suggested would be to use a callback. Given that only read knows how to handle exceptions, maybe something like this work?

 def read(stream, parsefunc): some_closure_data = {} def error_callback_1(e): manipulate(some_closure_data, e) def error_callback_2(e): transform(some_closure_data, e) for record in parsefunc(stream, error_callback_1): do_stuff(record) 

Then in parsefunc :

 def parsefunc(stream, error_callback): while not eof(stream): try: rec = read_record() yield rec except Exception as e: error_callback(e) 

I used closure over mutable local here; You can also define a class. Also note that you can access traceback information via sys.exc_info() inside the callback.

Another interesting approach might be to use send . It will work differently; basically, instead of defining a callback, read can check the result of the yield , execute a lot of complex logic and send replacement value that the generator then returns (or does something else), This is a bit more exotic, but I thought I mentioned it in case this is useful:

 >>> def parsefunc(it): ... default = None ... for x in it: ... try: ... rec = float(x) ... except ValueError as e: ... default = yield e ... yield default ... else: ... yield rec ... >>> parsed_values = parsefunc(['4', '6', '5', '5h', '22', '7']) >>> for x in parsed_values: ... if isinstance(x, ValueError): ... x = parsed_values.send(0.0) ... print x ... 4.0 6.0 5.0 0.0 22.0 7.0 

Actually, this is a little useless ("Why not just print the default value directly from read ?", You may ask), but you can do more complicated things with default inside the generator, reset values, go back step, etc. You can even wait for a callback to be sent at this point based on the error you received. But note that sys.exc_info() is cleared as soon as the yield s generator, so you will need to send everything from sys.exc_info() if you need trace access.

Here is an example of how you could combine two parameters:

 import string digits = set(string.digits) def digits_only(v): return ''.join(c for c in v if c in digits) def parsefunc(it): default = None for x in it: try: rec = float(x) except ValueError as e: callback = yield e yield float(callback(x)) else: yield rec parsed_values = parsefunc(['4', '6', '5', '5h', '22', '7']) for x in parsed_values: if isinstance(x, ValueError): x = parsed_values.send(digits_only) print x 
+7
Jul 10 2018-12-17T00:
source share

Possible construction example:

 from StringIO import StringIO import csv blah = StringIO('this,is,1\nthis,is\n') def parse_csv(stream): for row in csv.reader(stream): try: yield int(row[2]) except (IndexError, ValueError) as e: pass # don't yield but might need something # All others have to go up a level - so it wasn't parsable # So if it an IOError you know why, but this needs to catch # exceptions potentially, just let the major ones propogate for record in parse_csv(blah): print record 
+3
Jul 06 2018-12-18T00:
source share

I like this answer with Frozen material. Based on this idea, I came up with this, deciding two aspects that I did not like yet. The first were the templates needed to record it. The second is the loss of stack trace when an exception is thrown. I tried my best to solve the first, using as many decorators as possible. I tried to save the stack trace using sys.exc_info() instead of an exception.

My generator usually (i.e. without my application) would look like this:

 def generator(): def f(i): return float(i) / (3 - i) for i in range(5): yield f(i) 

If I can convert it to using an internal function to determine the value for the output, I can apply my method:

 def generator(): def f(i): return float(i) / (3 - i) for i in range(5): def generate(): return f(i) yield generate() 

This will not change anything, and calling it in this way will lead to an error with the correct stack trace:

 for e in generator(): print e 

Now, using my decorators, the code will look like this:

 @excepterGenerator def generator(): def f(i): return float(i) / (3 - i) for i in range(5): @excepterBlock def generate(): return f(i) yield generate() 

Not much changed optically. And you can still use it the way you used the version before:

 for e in generator(): print e 

And yet you get the correct stack trace when called. (Now there is one more frame.)

But now you can also use it as follows:

 it = generator() while it: try: for e in it: print e except Exception as problem: print 'exc', problem 

Thus, you can handle any exception that occurs in the generator in the consumer without excessive syntax problems and without losing stack trace.

Decorators are written as follows:

 import sys def excepterBlock(code): def wrapper(*args, **kwargs): try: return (code(*args, **kwargs), None) except Exception: return (None, sys.exc_info()) return wrapper class Excepter(object): def __init__(self, generator): self.generator = generator self.running = True def next(self): try: v, e = self.generator.next() except StopIteration: self.running = False raise if e: raise e[0], e[1], e[2] else: return v def __iter__(self): return self def __nonzero__(self): return self.running def excepterGenerator(generator): return lambda *args, **kwargs: Excepter(generator(*args, **kwargs)) 
+2
Feb 28 '13 at 16:02
source share

About your distribution point of the exception from the generator to the consumption function, you can try to use an error code (a set of error codes) to indicate an error. Although not elegant, this is one approach you might think of.

For example, in the code below giving a value, for example -1, where you expected a set of positive integers, it would signal to the calling function that the error.

 In [1]: def f(): ...: yield 1 ...: try: ...: 2/0 ...: except ZeroDivisionError,e: ...: yield -1 ...: yield 3 ...: In [2]: g = f() In [3]: next(g) Out[3]: 1 In [4]: next(g) Out[4]: -1 In [5]: next(g) Out[5]: 3 
+1
Jul 06 '12 at 19:20
source share

Actually, generators are rather limited in several aspects. You have found one thing: collecting exceptions is not part of their API.

You could take a look at Stackless Python stuff, such as greenlets or coroutines, which offer much more flexibility; but immersion in it is a little beyond.

+1
Sep 05
source share

(I answered another question related to OP, but my answer also applies to this situation)

I needed to solve this problem a couple of times and came across this question after searching for what other people did.

One option that will probably require a bit of refactoring is to simply create an error generator and throw exception in the generator (to another error handling generator) rather than raise .

Here's what the error generator function might look like:

 def err_handler(): # a generator for processing errors while True: try: # errors are thrown to this point in function yield except Exception1: handle_exc1() except Exception2: handle_exc2() except Exception3: handle_exc3() except Exception: raise 

The optional handler argument is provided to the parsefunc function, so it may parsefunc errors:

 def parsefunc(stream, handler): # the handler argument fixes errors/problems separately while not eof(stream): try: rec = read_record(stream) do some stuff yield rec except Exception as e: handler.throw(e) handler.close() 

Now just use the almost original read function, but now with an error handler:

 def read(stream, parsefunc): handler = err_handler() for record in parsefunc(stream, handler): do_stuff(record) 

This may not always be the best solution, but it is certainly an option, and relatively easy to understand.

+1
Aug 04 '17 at 2:46 on
source share



All Articles