NDB does not free memory during long request

I am currently uploading a long job to TaskQueue to calculate the relationships between NDB objects in a data warehouse.

Basically, this queue processes several lists of entity keys that must be associated with another query using the node_in_connected_nodes function in GetConnectedNodes node:

 class GetConnectedNodes(object): """Class for getting the connected nodes from a list of nodes in a paged way""" def __init__(self, list, query): # super(GetConnectedNodes, self).__init__() self.nodes = [ndb.model.Key('Node','%s' % x) for x in list] self.cursor = 0 self.MAX_QUERY = 100 # logging.info('Max query - %d' % self.MAX_QUERY) self.max_connections = len(list) self.connections = deque() self.query=query def node_in_connected_nodes(self): """Checks if a node exists in the connected nodes of the next node in the node list. Will return False if it doesn't, or the list of evidences for the connection if it does. """ while self.cursor < self.max_connections: if len(self.connections) == 0: end = self.MAX_QUERY if self.max_connections - self.cursor < self.MAX_QUERY: end = self.max_connections - self.cursor self.connections.clear() self.connections = deque(ndb.model.get_multi_async(self.nodes[self.cursor:self.cursor+end])) connection = self.connections.popleft() connection_nodes = connection.get_result().connections if self.query in connection_nodes: connection_sources = connection.get_result().sources # yields (current node index in the list, sources) yield (self.cursor, connection_sources[connection_nodes.index(self.query)]) self.cursor += 1 

Here, Node has a repeating connections property that contains an array with other Node key identifiers and a corresponding sources array for this connection.

The results are stored in block storage.

Now the problem I am getting is that after iterating the join function, the memory is somehow not cleared. The following log shows the memory used by AppEngine just before creating a new instance of GetConnectedNodes :

 I 2012-08-23 16:58:01.643 Prioritizing HGNC:4839 - mem 32 I 2012-08-23 16:59:21.819 Prioritizing HGNC:3003 - mem 380 I 2012-08-23 17:00:00.918 Prioritizing HGNC:8932 - mem 468 I 2012-08-23 17:00:01.424 Prioritizing HGNC:24771 - mem 435 I 2012-08-23 17:00:20.334 Prioritizing HGNC:9300 - mem 417 I 2012-08-23 17:00:48.476 Prioritizing HGNC:10545 - mem 447 I 2012-08-23 17:01:01.489 Prioritizing HGNC:12775 - mem 485 I 2012-08-23 17:01:46.084 Prioritizing HGNC:2001 - mem 564 C 2012-08-23 17:02:18.028 Exceeded soft private memory limit with 628.609 MB after servicing 1 requests total 

In addition to some fluctuations, the memory simply continues to increase, although none of the previous values โ€‹โ€‹are available. It was pretty hard for me to debug this or figure out if there was a memory leak somewhere, but I seem to have traced it to this class. Would thank for any help.

+7
source share
3 answers

We had similar problems (with long queries). We solved them by disabling the ndb cache by default. You can read about it here.

+10
source

In our case, this was caused by pressing AppEngine Appstats .

After turning off the memory, the memory consumption returns to normal.

+1
source

You can call gc.collect () at the beginning of each request.

-3
source

All Articles