NDB does not free memory during long request

Question

NDB does not free memory during long request

I am currently uploading a long job to TaskQueue to calculate the relationships between NDB objects in a data warehouse.

Basically, this queue processes several lists of entity keys that must be associated with another query using the node_in_connected_nodes function in GetConnectedNodes node:

 class GetConnectedNodes(object): """Class for getting the connected nodes from a list of nodes in a paged way""" def __init__(self, list, query): # super(GetConnectedNodes, self).__init__() self.nodes = [ndb.model.Key('Node','%s' % x) for x in list] self.cursor = 0 self.MAX_QUERY = 100 # logging.info('Max query - %d' % self.MAX_QUERY) self.max_connections = len(list) self.connections = deque() self.query=query def node_in_connected_nodes(self): """Checks if a node exists in the connected nodes of the next node in the node list. Will return False if it doesn't, or the list of evidences for the connection if it does. """ while self.cursor < self.max_connections: if len(self.connections) == 0: end = self.MAX_QUERY if self.max_connections - self.cursor < self.MAX_QUERY: end = self.max_connections - self.cursor self.connections.clear() self.connections = deque(ndb.model.get_multi_async(self.nodes[self.cursor:self.cursor+end])) connection = self.connections.popleft() connection_nodes = connection.get_result().connections if self.query in connection_nodes: connection_sources = connection.get_result().sources # yields (current node index in the list, sources) yield (self.cursor, connection_sources[connection_nodes.index(self.query)]) self.cursor += 1

Here, Node has a repeating connections property that contains an array with other Node key identifiers and a corresponding sources array for this connection.

The results are stored in block storage.

Now the problem I am getting is that after iterating the join function, the memory is somehow not cleared. The following log shows the memory used by AppEngine just before creating a new instance of GetConnectedNodes :

 I 2012-08-23 16:58:01.643 Prioritizing HGNC:4839 - mem 32 I 2012-08-23 16:59:21.819 Prioritizing HGNC:3003 - mem 380 I 2012-08-23 17:00:00.918 Prioritizing HGNC:8932 - mem 468 I 2012-08-23 17:00:01.424 Prioritizing HGNC:24771 - mem 435 I 2012-08-23 17:00:20.334 Prioritizing HGNC:9300 - mem 417 I 2012-08-23 17:00:48.476 Prioritizing HGNC:10545 - mem 447 I 2012-08-23 17:01:01.489 Prioritizing HGNC:12775 - mem 485 I 2012-08-23 17:01:46.084 Prioritizing HGNC:2001 - mem 564 C 2012-08-23 17:02:18.028 Exceeded soft private memory limit with 628.609 MB after servicing 1 requests total

In addition to some fluctuations, the memory simply continues to increase, although none of the previous values are available. It was pretty hard for me to debug this or figure out if there was a memory leak somewhere, but I seem to have traced it to this class. Would thank for any help.

+7

python google-app-engine memory-leaks app-engine-ndb task-queue

Francisco roque Aug 23 '12 at 15:35

source share

3 answers

In our case, this was caused by pressing AppEngine Appstats .

After turning off the memory, the memory consumption returns to normal.

+1

krtek Sep 27 '13 at 13:40

source share

You can call gc.collect () at the beginning of each request.

-3

Guido van rossum Aug 23 '12 at 21:44

source share

Lukas Šalkauskas · Accepted Answer · 2012-08-24T11:50:09+0000

We had similar problems (with long queries). We solved them by disabling the ndb cache by default. You can read about it here.

NDB does not free memory during long request

More articles: