Our customers are having problems with our appengine python application, which requires processing the task queues to generate reports and display as soon as they are completed. This temporary solution for the well-known delay and timeout of GAE has worked well for us until recently.
Last week, we started complaining about how long users had to wait for reports. It was no more than a minute, but now it can take more than 10 minutes.
In addition, I cannot reproduce the problem, but looking at the task queue, I see that these tasks simply do not start.
Below is a screenshot of one of the queues (not the one that generates reports, but the problem occurs in all queues).
http://www.clipular.com/c/4829223501430784.png?k=QaP2kedZm6NcvrKzwVSJqq2YI1g
We see that there are no running tasks, but the only task in the queue did not start until it completed 7 minutes of waiting. And look at the ETA, he predicts that the task should begin in the past. In the end, he left, but why didn't he start earlier?
The reasons why I have already ruled out:
- Not enough resources or instances: this happens even after midnight, when we receive only a few requests.
- Bad queue configuration. Not that we have all the variety of queue configurations, and that happens then. For example, maximum speed = 350 / s, bucket size = 400, maximum parallel = 400.
source share