Like memory garbage collected in application engine (python) while repeating db results

I have code that iterates over DB entities and runs in a task - see below.

In the application, I get an Exceeded soft private memory limit error, and indeed checking memory_usage().current() confirms the problem. See below the output from the registration report. It seems that every time a lot of foos is removed, memory rises.

My question is: why is memory not a garbage collection? I would expect that at each iteration of the loops ( while and for loop, respectively), reusing the name foos and foo will cause the objects foos and foo used to denote to be "canceled" (ie, become inaccessible) and therefore become suitable for garbage collection, and then garbage will be collected, since the memory will be hard. But it is obvious that this is not happening.

 from google.appengine.api.runtime import memory_usage batch_size = 10 dict_of_results = {} results = 0 cursor = None while True: foos = models.Foo.all().filter('status =', 6) if cursor: foos.with_cursor(cursor) for foo in foos.run(batch_size = batch_size): logging.debug('on result #{} used memory of {}'.format(results, memory_usage().current())) results +=1 bar = some_module.get_bar(foo) if bar: try: dict_of_results[bar.baz] += 1 except KeyError: dict_of_results[bar.baz] = 1 if results >= batch_size: cursor = foos.cursor() break else: break 

and in some_module.py

 def get_bar(foo): for bar in foo.bars: if bar.status == 10: return bar return None 

Logging.debug output (abbreviated)

 on result #1 used memory of 43 on result #2 used memory of 43 ..... on result #20 used memory of 43 on result #21 used memory of 49 ..... on result #32 used memory of 49 on result #33 used memory of 54 ..... on result #44 used memory of 54 on result #45 used memory of 59 ..... on result #55 used memory of 59 ..... ..... ..... on result #597 used memory of 284.3 Exceeded soft private memory limit of 256 MB with 313 MB after servicing 1 requests total 
+3
source share
2 answers

It looks like your batch solution is in conflict with the db package, resulting in a lot of additional batches.

When you run query.run(batch_size=batch_size) , db will run the query until the end of the limit. When you get to the end of the batch, db will take the next batch. However, immediately after db does this, you exit the loop and start over. This means that parties 1 → n will exist in memory twice. Once to receive the latest requests, once to receive the following requests.

If you want to iterate over all your entities, just let db process the package:

 foos = models.Foo.all().filter('status =', 6) for foo in foos.run(batch_size = batch_size): results +=1 bar = some_module.get_bar(foo) if bar: try: dict_of_results[bar.baz] += 1 except KeyError: dict_of_results[bar.baz] = 1 

Or, if you want to handle batch loading yourself, make sure db is not performing any operations:

 while True: foo_query = models.Foo.all().filter('status =', 6) if cursor: foo_query.with_cursor(cursor) foos = foo_query.fetch(limit=batch_size) if not foos: break cursor = foos.cursor() 
+3
source

You may be looking in the wrong direction.

Take a look at this Q&A question for approaches to checking for garbage collection and possible alternative explanations: Using Query App Engine Query Memory

0
source

All Articles