Retrying failed jobs in RQ

We use RQ with our WSGI application. We do several different processes on different server servers that run tasks by connecting (possibly) several different task servers. In order to better configure this setting, we use a user-level control in our system, which takes care of launching workers, setting up task queues, etc.

When a task fails, we would like to perform a repeat that repeats the task several times after increasing the delay and, in the end, either completes it or does not work and records an error record in our logging system. However, I am not sure how this should be implemented. I already created a custom working script that allows us to log an error in our database, and my first attempt to repeat was something like this:

# This handler would ideally wait some time, then requeue the job. def worker_retry_handler(job, exc_type, exc_value, tb): print 'Doing retry handler.' current_retry = job.meta[attr.retry] or 2 if current_retry >= 129600: log_error_message('Job catastrophic failure.', ...) else: current_retry *= 2 log_retry_notification(current_retry) job.meta[attr.retry] = current_retry job.save() time.sleep(current_retry) job.perform() return False 

As I already mentioned, we also have a function in the working file that correctly resolves the server to which it must connect and can send jobs. The problem is not how to publish the task, but what to do with the task instance that you get in the exception handler.

Any help would be greatly appreciated. If there are suggestions or pointers on the best way to do this, it will also be great. Thanks!

+7
source share
1 answer

I see two possible problems:

  • You must have a return value. False prevents the default exception handling in the job (see the last section on this page: http://python-rq.org/docs/exceptions/ )

  • I think that by the time your handler receives the call, the work is no longer in the queue. I am not 100% positive (especially considering the documents I pointed out above), but if it is in an unsuccessful queue, you can call requeue_job (job.id) to try again. If this is not the case (apparently it is not), you could probably grab the proper queue and insert directly into it.

+1
source

All Articles