"Internet" fix monkey function

Question

"Internet" fix monkey function

Your program is simply paused at pdb.set_trace() .

Is there a way to decapitate the corrected function that is currently running and perform a “resume”?

Is this possible with frame manipulation?

In some context:

Often I will have a complex function that processes large amounts of data without having a priori knowledge of what data I will find:

 def process_a_lot(data_stream): #process a lot of stuff #... data_unit= data_stream.next() if not can_process(data_unit) import pdb; pdb.set_trace() #continue processing

This convenient design launches an interactive debugger when it encounters unknown data, so I can check it as I wish and change the process_a_lot code to handle it correctly.

The problem is that when data_stream big, you don’t want to data_stream over all the data again (let's say that next is slow, so you cannot save what you already have and skip in the next run)

Of course, you can replace other functions in turn once in the debugger. You can also replace this function, but it will not change the current execution context.

Edit: As some people become third-party: I know that there are many ways to structure your code so that your processing function is separate from process_a_lot . I am not asking about how to structure the code in the same way as about how to recover (at runtime) from a situation where the code is not ready to handle the replacement.

+7

function python monkeypatching

goncalopp Nov 05 '14 at 18:01

source share

4 answers

First (prototype), then some important reservations.

 # process.py import sys import pdb import handlers def process_unit(data_unit): global handlers while True: try: data_type = type(data_unit) handler = handlers.handler[data_type] handler(data_unit) return except KeyError: print "UNUSUAL DATA: {0!r}". format(data_unit) print "\n--- INVOKING DEBUGGER ---\n" pdb.set_trace() print print "--- RETURNING FROM DEBUGGER ---\n" del sys.modules['handlers'] import handlers print "retrying" process_unit("this") process_unit(100) process_unit(1.04) process_unit(200) process_unit(1.05) process_unit(300) process_unit(4+3j) sys.exit(0)

and

 # handlers.py def handle_default(x): print "handle_default: {0!r}". format(x) handler = { int: handle_default, str: handle_default }

In Python 2.7, this gives you a dictionary linking expected / known types to functions that handle each type. If a handler is not available for the type, the user is dropped into the debugger, giving them the opportunity to modify the handlers.py file using the appropriate handlers. In the above example, there is no float or complex value handler. When they appear, the user will need to add the appropriate handlers. For example, you can add:

 def handle_float(x): print "FIXED FLOAT {0!r}".format(x) handler[float] = handle_float

And then:

 def handle_complex(x): print "FIXED COMPLEX {0!r}".format(x) handler[complex] = handle_complex

Here is what it will look like:

 $ python process.py handle_default: 'this' handle_default: 100 UNUSUAL DATA: 1.04 --- INVOKING DEBUGGER --- > /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit() -> print (Pdb) continue --- RETURNING FROM DEBUGGER --- retrying FIXED FLOAT 1.04 handle_default: 200 FIXED FLOAT 1.05 handle_default: 300 UNUSUAL DATA: (4+3j) --- INVOKING DEBUGGER --- > /Users/jeunice/pytest/testing/sfix/process.py(18)process_unit() -> print (Pdb) continue --- RETURNING FROM DEBUGGER --- retrying FIXED COMPLEX (4+3j)

Good, so it basically works. You can improve and customize it in a more production-ready form, which makes it compatible with Python 2 and 3, etc.

Please think long and hard before doing it this way.

This “real-time code change” approach is an incredibly fragile and error-prone approach. It encourages you to make hot fixes in real time at the very last moment. These fixes will probably not have good or sufficient testing. Almost by definition, you have just discovered that you are dealing with a new type T. You still know little about T, why this happened, what could be the cases of its edges and failures, etc. And if your “fix” code or hot fix does not work, then what? Sure, you can add a few exception handlers, catch more exception classes, and possibly continue.

Web structures like Flask have debugging modes that work mostly this way. But these are debugging modes and are generally not suitable for production. Moreover, what if you enter the wrong command in the debugger? Accidentally type "quit", not "continue", and the whole program will end, and with it your desire to keep the processing alive. If it is necessary for use in debugging (perhaps to learn new types of data streams), follow these steps.

If this is used for production, consider instead a strategy that allocates raw types for asynchronous, out-of-band exam and correction, rather than putting a developer / operator in the middle of a real-time processing flow.

+3

Jonathan eunice Nov 05 '14 at 20:11

source share

If I understand correctly:

you do not want to repeat all the work that has already been done
you need a way to replace #continue processing as usual with new code as soon as you figure out how to process new data

@ user2357112 was on the right track: expected_types should be a dictionary

 data_type:(detect_function, handler_function)

and detect_type needs to go through to find a match. If no match is found, pdb appears, you can find out what is happening, write new detect_function and handler_funcion , add them to expected_types and c ontinue from pdb.

+1

Ethan furman Nov 05 '14 at 18:41

source share

I would like to know if there is a way to decapitate the corrected function (process_a_lot) and perform a “resume”.

So, do you want to somehow write a new process_a_lot function from inside pdb, and then transfer control to it at the pdb call site?

Or do you want to rewrite the function outside of pdb, and then somehow reload this function from the .py file and transfer control in the middle of the function at the location of the pdb call?

The only possibility I can think of is: from within pdb import the new written function, and then replace the current process_a_lot byte code with the byte code from the new function (I think func.co_code or something). Make sure you don’t change anything in the new function (even the pdb lines) before the pdb lines, and this might work.

But even if this happens, I would think that this is a very fragile decision.

+1

Ethan furman Nov 05 '14 at 19:58

source share

Jonathan eunice · Accepted Answer · 2014-11-06T15:03:52+0000

Not.

You cannot moneky-patch use the currently running Python function and keep clicking as if nothing else happened. At least not in any general or practical sense.

In theory, this is possible - but only in limited circumstances, with great effort and skill. This cannot be done with any generality.

To try, you need:

Find the appropriate function source and edit it (simply)
Compile the changed function source into bytecode (simple)
Insert new bytecode instead of old (executable)
Change the function overhead to indicate “logically” “the same point” in the program where it went to pdb. (in some cases)
Continue from the debugger, returning back to debugged code (iffy)

There are some circumstances in which you could reach 4 and 5 if you knew a lot about the homework function and similar debugging variables. But note:

The bytecode offset at which the pdb breakpoint ( f_lasti in the frame object) is called can be changed. You may have to narrow down your goal to “only change the code further in the function’s source code than the breakpoint occurred” so that everything is simple enough — otherwise you would need to figure out where the breakpoint is in the recently compiled bytecode. This may be feasible, but again under restrictions (for example, "it will only call pdb_trace() once or the like" leave the breadcrumbs for analysis after the breakpoint ").
You need to be sharp when fixing function, frame, and code objects . Pay particular attention to func_code in a function ( __code__ if you also support Python 3); f_lasti , f_lineno and f_code in the frame; and co_code , co_lnotab and co_stacksize in code.
For the love of God, I hope you are not going to change the parameters of the function, name or other macro-defining characteristics. This would at least triple the amount of household required.
More worrying, adding new local variables (a fairly common thing you want to do to change program behavior) is very, very good. This would affect f_locals , co_nlocals and co_stacksize - and, quite possibly, completely rearrange the order and method of accessing the bytecode. You could minimize this by adding assignment instructions like x = None to all of your original locals. But depending on how the bytecodes change, you might even need to install a hot patch on the Python stack, which cannot be done with Python as such. Therefore, C / Cython extensions may be required there.
Here's a very simple example showing that the ordering and arguments of the bytecode can change significantly even with small changes to very simple functions:
```
 def a(x): LOAD_FAST 0 (x) y = x + 1 LOAD_CONST 1 (1) return y BINARY_ADD STORE_FAST 1 (y) LOAD_FAST 1 (y) RETURN_VALUE ------------------ ------------------ def a2(x): LOAD_CONST 1 (2) inc = 2 STORE_FAST 1 (inc) y = x + inc LOAD_FAST 0 (x) return y LOAD_FAST 1 (inc) BINARY_ADD STORE_FAST 2 (y) LOAD_FAST 2 (y) RETURN_VALUE 
```
Be equivalent when correcting some pdb values that keep track of where it is being debugged, because when you type "continue", this is what dictates where the control flow goes.
Limit your patch functions to those with a static state. For example, they should not have objects that could be collected with garbage before the breakpoint is resumed, but will be available after it (for example, in your new code). For example:.
```
 some = SomeObject() # blah blah including last touch of `some` # ... pdb.set_trace() # Look, Ma! I'm monkey-patching! if some.some_property: # oops, `some` was GC'd - DIE DIE DIE 
```
While "providing the runtime for the fixed function is the same as ever" is potentially problematic for many values, it guaranteed to crash and burn if any of them went out of their normal dynamic area and garbage collected before than to correct changes in their dynamic coverage / lifetime.
Suppose you ever wanted to run this on CPython, because PyPy, Jython, and other Python implementations do not even have standard Python bytecodes and do not perform their function, code, and personnel management in different ways.

I would like to say that this superdynamic correction is possible. And I'm sure that you can build simple cases with a lot of household items when it really works. But real code has objects that go beyond. Actual corrections may require the allocation of new variables. Etc. Real-world conditions greatly enhance the effort required for a fix to work, and in some cases make this fix strictly impossible.

And at the end of the day, what have you achieved? A very fragile, fragile, unsafe way to expand the processing of data flow. There is a reason why most monkey patches run at the boundaries of functions, and even then they are reserved for several very important use cases. Streaming is better served by applying a strategy that discards the unrecognized values of the out-of-competition exam and placement .

"Internet" fix monkey function

More articles: