I'm trying to diagnose a problem where some of my workflows in the celery seem to hang for a few minutes. I have many tasks that make several I / O calls (usually these are third-party APIs). In any given job, I can make several thousand requests to various APIs. I looked at the logs and they all have in common: they hang after urllib3 connects to the remote url.
At the end of my work (it takes ~ 30 minutes), several tasks are usually performed.
Here is an example of the logs I use to conclude urllib3 is the culprit:
Jul 08 04:46:26 app/worker.1: [INFO/MainProcess] [???(???)] celery.worker.strategy: Received task: my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897] Jul 08 04:46:28 app/worker.1: [DEBUG/Worker-11] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] src.aggregates.prospect.services.prospect_service: Beginning: Get social account data. provider_name: twitter, account_uid: some_user Jul 08 04:46:28 app/worker.1: [INFO/Worker-11] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.some_api.com
And then that. Behind the Starting new HTTPS connection instruction nothing is written.
Here I restarted the worker:
Jul 08 05:09:18 app/worker.1: [INFO/MainProcess] [???(???)] celery.worker.strategy: Received task: my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897] Jul 08 05:09:19 app/worker.1: [DEBUG/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] src.aggregates.prospect.services.prospect_service: Beginning: Get social account data. provider_name: twitter, account_uid: some_user Jul 08 05:09:19 app/worker.1: [DEBUG/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: "GET /v2/api_call.json?username=some_user HTTP/1.1" 403 170 Jul 08 05:09:19 app/worker.1: [INFO/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.some_api.com Jul 08 05:09:19 app/worker.1: [INFO/MainProcess] [???(???)] celery.worker.job: Task my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897] succeeded in 2.265356543008238s: 32345
In this case, I received a status code of 403 . So this could be a potential criminal. However, I saw in the logs that this happened with 200 status codes.
FWIW, I also often see Resetting dropped connection: api.twitter.com in magazines.
I always provided a timeout 10 seconds when I make a request using the requests library.
So it seems that the request is made and then just freezes. Remote servers may be reacting at a very slow pace, and so a timeout never occurs, but I find this unlikely, since my problem does not occur with just one specific domain.
I use rabbit 3.1.3 celery: 3.1.11 (Cipater) kombu: 3.0.16 py: 3.4.0 billiards: 3.3.0.17 py-amqp: 1.4.5. I am using prefork.
I use queries == 2.3.0.
So, why does my tasks seem to hang after urllib3 logs the Starting new HTTPS connection statement?
EDIT: I added a lot of protocols and provided the following context below
2014-07-16T02:43:53.140381+00:00 app[worker.1]: [INFO/MainProcess/2] [???(???)] celery.worker.strategy: Received task: [973c1361-43c3-41a2-9bcd-6ef850c41fcc] 2014-07-16T02:43:56.951211+00:00 app[worker.1]: [1;34m[DEBUG/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] : twitter, account_uid: some_user[0m 2014-07-16T02:43:56.951876+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to get request 2014-07-16T02:43:56.952005+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to prep request 2014-07-16T02:43:56.952897+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to prep request 2014-07-16T02:43:56.954133+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get response not chunked 2014-07-16T02:43:56.954249+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: urlopen: about to get a conn from the pool 2014-07-16T02:43:56.954360+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _get_conn: about to get a conn from the pool 2014-07-16T02:43:56.954482+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _get_conn: about to get a conn from the pool 2014-07-16T02:43:56.954587+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: returning conn or self new conn 2014-07-16T02:43:56.951725+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to get request 2014-07-16T02:43:56.954692+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.fullcontact.com 2014-07-16T02:43:56.954817+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: create connection class 2014-07-16T02:43:56.954948+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: just about to create connection class 2014-07-16T02:43:56.955062+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: returning conn or self new conn 2014-07-16T02:43:56.955166+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: urlopen: about to get a conn from the pool 2014-07-16T02:43:56.955290+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: about to call request 2014-07-16T02:43:56.955396+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send req 2014-07-16T02:43:56.953410+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to send 2014-07-16T02:43:56.953554+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get conn 2014-07-16T02:43:56.953986+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to get conn 2014-07-16T02:43:56.956014+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: _send_output 2014-07-16T02:43:56.955517+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: put req 2014-07-16T02:43:56.955715+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: put req 2014-07-16T02:43:56.956146+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send 2014-07-16T02:43:56.956309+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified conection: connect 2014-07-16T02:43:56.959157+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: set socket 2014-07-16T02:43:56.959045+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: set socket 2014-07-16T02:43:56.959295+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve cert reqs 2014-07-16T02:43:56.959402+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: resolve cert reqs 2014-07-16T02:43:56.959508+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve ssl ver 2014-07-16T02:43:56.959613+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: resolve ssl ver 2014-07-16T02:43:56.959715+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: ssl_wrap_socket 2014-07-16T02:43:56.959827+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: about to get ssl context 2014-07-16T02:43:56.960012+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: about to get ssl context 2014-07-16T02:43:56.960119+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: load verify locations
Last registered post
2014-07-16T02:43:56.960119+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: load verify locations
According to my log statement, the culprit is load_verify_locations .
This is the corresponding code in the requests library:
if ca_certs: try: context.load_verify_locations(ca_certs) # Py32 raises IOError # Py33 raises FileNotFoundError except Exception as e: # Reraise as SSLError raise SSLError(e)
So, it looks like context.load_verify_locations(ca_certs) is causing a hang. Why?