Memory leak when running python script from C ++

Question

Memory leak when running python script from C ++

The following minimal python function call example from C ++ has a memory leak on my system:

script.py :

 import tensorflow def foo(param): return "something"

main.cpp :

 #include "python3.5/Python.h" #include <iostream> #include <string> int main() { Py_Initialize(); PyRun_SimpleString("import sys"); PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']"); PyRun_SimpleString("sys.path.append('./')"); PyObject* moduleName = PyUnicode_FromString("script"); PyObject* pModule = PyImport_Import(moduleName); PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo"); PyObject* param = PyUnicode_FromString("dummy"); PyObject* args = PyTuple_Pack(1, param); PyObject* result = PyObject_CallObject(fooFunc, args); Py_CLEAR(result); Py_CLEAR(args); Py_CLEAR(param); Py_CLEAR(fooFunc); Py_CLEAR(pModule); Py_CLEAR(moduleName); Py_Finalize(); }

compiled with

 g++ -std=c++11 main.cpp $(python3-config --cflags) $(python3-config --ldflags) -o main

and run with valgrind

 valgrind --leak-check=yes ./main

displays the following summary

 LEAK SUMMARY: ==24155== definitely lost: 161,840 bytes in 103 blocks ==24155== indirectly lost: 33 bytes in 2 blocks ==24155== possibly lost: 184,791 bytes in 132 blocks ==24155== still reachable: 14,067,324 bytes in 130,118 blocks ==24155== of which reachable via heuristic: ==24155== stdstring : 2,273,096 bytes in 43,865 blocks ==24155== suppressed: 0 bytes in 0 blocks

I am using Linux Mint 18.2 Sonya , g++ 5.4.0 , Python 3.5.2 and TensorFlow 1.4.1 .

Removing import tensorflow results in leakage disappearing. Is this a mistake in TensorFlow or am I doing something wrong? (I expect the latter to be true.)

Also, when I create a Keras layer in Python

 #script.py from keras.layers import Input def foo(param): a = Input(shape=(32,)) return "str"

and retry calling Python from C ++ again

 //main.cpp #include "python3.5/Python.h" #include <iostream> #include <string> int main() { Py_Initialize(); PyRun_SimpleString("import sys"); PyRun_SimpleString("if not hasattr(sys,'argv'): sys.argv = ['']"); PyRun_SimpleString("sys.path.append('./')"); PyObject* moduleName = PyUnicode_FromString("script"); PyObject* pModule = PyImport_Import(moduleName); for (int i = 0; i < 10000000; ++i) { std::cout << i << std::endl; PyObject* fooFunc = PyObject_GetAttrString(pModule, "foo"); PyObject* param = PyUnicode_FromString("dummy"); PyObject* args = PyTuple_Pack(1, param); PyObject* result = PyObject_CallObject(fooFunc, args); Py_CLEAR(result); Py_CLEAR(args); Py_CLEAR(param); Py_CLEAR(fooFunc); } Py_CLEAR(pModule); Py_CLEAR(moduleName); Py_Finalize(); }

Application memory consumption continuously increases indefinitely at runtime.

So, I think there is something fundamentally wrong with the way I call the python function from C ++, but what is it?

+8

c ++ python python-3.x memory-leaks tensorflow

Tobias Hermann Jan 11 '18 at 16:49

source share

1 answer

ead · Accepted Answer · 2018-01-15T19:37:46+0000

There are two different types of "memory leak" in your question.

Valgrind will tell you about the first type of memory leak. However, for python modules, it is quite difficult to use a memory leak - basically these are some global variables that are allocated / initialized when the module loads. And since the module is loaded only once in Python, this is not a big problem.

A well-known example: numpy PyArray_API : it must be initialized via _import_array , then it is not deleted and remains in memory until the python interpreter is closed.

Thus, this is a “memory leak” for each design, you can say whether it is a good design or not, but at the end of the day you can do nothing about it.

I don't have enough information about the tensorflow module to indicate where memory leaks occur, but I'm sure you have nothing to worry about.

The second "memory leak" is more subtle.

You can get an advantage when you compare valgrind output for 10^4 and 10^5 iterations of the loop - there will be almost no difference! However, there is a difference in peak memory consumption.

Unlike C ++, Python has a garbage collector - so you cannot know exactly when an object is destroyed. CPython uses reference counting, so when the reference count gets 0, the object is destroyed. However, when there is a link loop (for example, object A contains a link to object B , and object B contains a link to object B ), this is not so simple: the garbage collector needs to iterate through all the objects to find such unused loops.

You might think that keras.layers.Input has such a loop somewhere (and this is true), but this is not the reason for this "memory leak", which can be observed for pure python.

We use the objgraph package to check the links, let it run the following python script:

 #pure.py from keras.layers import Input import gc import sys import objgraph def foo(param): a = Input(shape=(1280,)) return "str" ### MAIN : print("Counts at the beginning:") objgraph.show_most_common_types() objgraph.show_growth(limit=7) for i in range(int(sys.argv[1])): foo(" ") gc.collect()# just to be sure print("\n\n\n Counts at the end") objgraph.show_most_common_types() objgraph.show_growth(limit=7) import random objgraph.show_chain( objgraph.find_backref_chain( random.choice(objgraph.by_type('Tensor')), #take some random tensor objgraph.is_proper_module), filename='chain.png')

and run it:

 >>> python pure.py 1000

We can see the following: at the end there are exactly 1000 Tersors, which means that not one of our created objects has been deleted!

If we look at a chain that supports a live tensor object (created using objgraph.show_chain ), we see:

that there is a tensor flow-Graph-object where all the tensors are registered and remain there until session is closed.

So far, theory has, however, been the case:

 #close session and free resources: import keras keras.backend.get_session().close()#free all resources print("\n\n\n Counts after session.close():") objgraph.show_most_common_types()

and here is the proposed solution:

 with tf.Graph().as_default(), tf.Session() as sess: for step in range(int(sys.argv[1])): foo(" ")

worked for the current version of tensorflow. This is probably a mistake .

In short: you are not doing anything wrong with your C ++ code, there are no memory leaks for which you are responsible. In fact, you will see exactly the same memory consumption if you called the foo function again from a pure python script.

All created tensors are registered in the Graph object and are not automatically released; you must free them by closing the backend session, which, however, does not work due to an error in the current tensorflow file version 1.4.0.

Memory leak when running python script from C ++

More articles: