Updating for a large number of NDB objects fails

I have a very simple task. After migrating and adding a new field (repeating and compound property) to an existing NDB entity (~ 100K entities) I need to set the default value for it.

I tried this code first:

q = dm.E.query(ancestor=dm.E.root_key) for user in q.iter(batch_size=500): user.field1 = [dm.E2()] user.put() 

But he is not with such errors:

 2015-04-25 20:41:44.792 /**** 500 599830ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=1-17-0 W 2015-04-25 20:32:46.675 suspended generator run_to_queue(query.py:938) raised Timeout(The datastore operation timed out, or the data was temporarily unavailable.) W 2015-04-25 20:32:46.676 suspended generator helper(context.py:876) raised Timeout(The datastore operation timed out, or the data was temporarily unavailable.) E 2015-04-25 20:41:44.475 Traceback (most recent call last): File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 267, in 

The task is performed in a separate task queue, so it takes at least 10 minutes to complete it, but it seems to be insufficient. Strange another: warnings from NDB. Maybe there is a deadlock due to updates for the same Creatures from other instances (user initiated), but not sure.

In any case, I want to know the best practices (and the simplest ones) for such a task. I know about MapReduce, but he is currently looking for a task too complex for me.

UPDATE:

I also tried using put_multi , capturing all the entities in the array, but GAE stops the instance as soon as it exceeds ~ 600 MB of memory (with a limit of 500 MB). There seems to be insufficient memory to store all objects (~ 100K).

+5
source share
1 answer

After executing _migrate_users() it will process 50 users and then create another task to handle the next 50 users, etc. You can use a larger batch size than 50, depending on the size of your objects.

 def _migrate_users(curs=None): users, next_curs, more = User.query().fetch_page(50, start_cursor=curs) for user in users: user.field1 = 'bla bla' ndb.put_multi(users) if more: deferred.defer(_migrate_users, next_curs, _queue='default') 
+1
source

All Articles