Work with the same problem. I looked at the reasons listed in the official documents. Memory consumption looks normal from statistics. My code also handles issues with Datastore spores. Timeouts too. Changing the task mechanism to work in recoverable pieces seems to be the only way out.
After pursuing this error for a while, it seems the AppEngine development paradigm revolves around URL handlers with time, memory, etc. restrictions. This applies to operations with a long start. I gave my long-term task to small tasks. Target queues run smaller tasks that run before ending the queue with the next task. Never failed before!
The advantage is that taskqueuus is better fault tolerant / handover than just a huge cron job. One of the failed tasks does not mean that the rest of the huge list of tasks fails.
Harisankar krishna swamy
source share