Intermittent 502 with nginx + gunicorn + django

Over the past few weeks, we have seen more than 502 errors. Our stack is currently nginx + gunicron + django in the m1.large EC2 instance, supported by a small RDS instance.

They seem to become more frequent as the load on the request increases. I will see random 502 when using a browser, but our command line scripts falling into api (Tasty Pie) usually fail on the second or third request. However, if I add a sleep function to the script just before it makes a request, this will be fine for this request, and 502 for the next. Note that we are using digest auth with the query library and slumber shell - hence the template 401, 200.

To make debugging even more difficult, the problem will be resolved when Gunicorn is launched with the --debug option. The error still exists if I remove the -debug option, but restrict my working Gunicorn 1 explicitly.

My nginx.conf:

user www-data; worker_processes 4; pid /var/run/nginx.pid; events { worker_connections 768; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; gzip_disable "msie6"; gzip_proxied any; gzip_types application/x-ghi-packedschemafeatures-v1 gzip_http_version 1.1; gzip_comp_level 1; gzip_min_length 500; proxy_buffering on; proxy_http_version 1.1; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } 

Virtual host file:

  server { listen 80; server_name pipeline.ourdomain.com; location / { rewrite ^ https://$server_name$request_uri permanent; } } server { listen 443; server_name pipeline.ourdomain.com; ssl on; ssl_protocols SSLv3 TLSv1; ssl_ciphers ALL:-ADH:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP; ssl_session_cache shared:SSL:10m; ssl_certificate /etc/ssl/certs/ourdomain.com.combined.crt; ssl_certificate_key /etc/ssl/private/ourdomain.com.key; root /var/www/; location /static/ { alias /var/www/production/pipeline/public/; } location / { proxy_pass_header Server; proxy_set_header Host $http_host; proxy_redirect off; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Scheme $scheme; proxy_set_header X-Forwarded-Protocol https; proxy_connect_timeout 240; proxy_read_timeout 280; proxy_pass http://localhost:8000/; } error_page 500 502 503 504 /static/50x.html; } 

Gunicorn Team

 #!/bin/bash set -e LOGFILE=/var/log/gunicorn/ea_pipeline.log LOGDIR=$(dirname $LOGFILE) SETTINGS=production_settings # user/group to run as USER=ubuntu GROUP=ubuntu DJANGO_PATH=$(dirname $(readlink -f $0))/../ cd $DJANGO_PATH echo $(pwd) . ../env/bin/activate test -d $LOGDIR || mkdir -p $LOGDIR exec ../env/bin/gunicorn_django \ --user=$USER --group=$GROUP --log-level=debug \ --preload \ --workers=4 \ --timeout=90 \ --settings=$SETTINGS \ --limit-request-line=8190 \ --limit-request-field_size 0 \ --pythonpath=$DJANGO_PATH \ --log-file=$LOGFILE production_settings.py 2>>$LOGFILE 

Access log example:

 67.134.170.194 - - [24/Aug/2012:00:28:17 +0000] "GET /api/v1/storage/ HTTP/1.1" 401 5 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic" 67.134.170.194 - - [24/Aug/2012:00:28:18 +0000] "GET /api/v1/storage/ HTTP/1.1" 200 326 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic" 67.134.170.194 - - [24/Aug/2012:00:28:18 +0000] "GET /api/v1/customer/?client_id=lamb_01 HTTP/1.1" 502 18 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic" 67.134.170.194 - - [24/Aug/2012:00:29:41 +0000] "GET /api/v1/storage/ HTTP/1.1" 502 18 "-" "python-requests/0.13.8 CPython/2.7.3 Linux/3.2.0-29-generic" 

Nginx Error Log:

 2012/08/24 00:28:18 [error] 16490#0: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 67.134.170.194, server: pipeline.ourdomain.com, request: "GET /api/v1/customer/?client_id=lamb_01 HTTP/1.1", upstream: "http://127.0.0.1:8000/api/v1/customer/?client_id=lamb_01", host: "pipeline.ourdomain.com" 2012/08/24 00:29:41 [error] 16490#0: *7 connect() failed (111: Connection refused) while connecting to upstream, client: 67.134.170.194, server: pipeline.ourdomain.com, request: "GET /api/v1/storage/ HTTP/1.1", upstream: "http://127.0.0.1:8000/api/v1/storage/", host: "pipeline.ourdomain.com" 

Sample magazine Gunicorn:

 2012-08-24 17:03:13 [8716] [INFO] Starting gunicorn 0.14.3 2012-08-24 17:03:13 [8716] [DEBUG] Arbiter booted 2012-08-24 17:03:13 [8716] [INFO] Listening at: http://127.0.0.1:8000 (8716) 2012-08-24 17:03:13 [8716] [INFO] Using worker: sync 2012-08-24 17:03:13 [8735] [INFO] Booting worker with pid: 8735 2012-08-24 17:03:13 [8736] [INFO] Booting worker with pid: 8736 2012-08-24 17:03:13 [8737] [INFO] Booting worker with pid: 8737 2012-08-24 17:03:13 [8738] [INFO] Booting worker with pid: 8738 2012-08-24 17:03:21 [8738] [DEBUG] GET /api/v1/storage/ Assertion failed: ok (mailbox.cpp:84) 2012-08-24 17:03:21 [8738] [INFO] Parent changed, shutting down: <Worker 8738> 2012-08-24 17:03:21 [8738] [INFO] Worker exiting (pid: 8738) Error in sys.exitfunc: 2012-08-24 17:03:21 [8737] [DEBUG] GET /api/v1/storage/ 2012-08-24 17:03:22 [8838] [INFO] Starting gunicorn 0.14.3 2012-08-24 17:03:22 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000) 2012-08-24 17:03:22 [8838] [ERROR] Retrying in 1 second. 2012-08-24 17:03:22 [8737] [INFO] Parent changed, shutting down: <Worker 8737> 2012-08-24 17:03:22 [8737] [INFO] Worker exiting (pid: 8737) Error in sys.exitfunc: 2012-08-24 17:03:22 [8736] [DEBUG] GET /api/v1/customer/ 2012-08-24 17:03:23 [8736] [INFO] Parent changed, shutting down: <Worker 8736> 2012-08-24 17:03:23 [8736] [INFO] Worker exiting (pid: 8736) Error in sys.exitfunc: 2012-08-24 17:03:23 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000) 2012-08-24 17:03:23 [8838] [ERROR] Retrying in 1 second. 2012-08-24 17:03:24 [8735] [DEBUG] GET /api/v1/upload_action/ 2012-08-24 17:03:24 [8838] [ERROR] Connection in use: ('127.0.0.1', 8000) 2012-08-24 17:03:24 [8838] [ERROR] Retrying in 1 second. 2012-08-24 17:03:24 [8735] [INFO] Parent changed, shutting down: <Worker 8735> 2012-08-24 17:03:24 [8735] [INFO] Worker exiting (pid: 8735) Error in sys.exitfunc: 2012-08-24 17:03:25 [8838] [DEBUG] Arbiter booted 2012-08-24 17:03:25 [8838] [INFO] Listening at: http://127.0.0.1:8000 (8838) 2012-08-24 17:03:25 [8838] [INFO] Using worker: sync 2012-08-24 17:03:25 [8907] [INFO] Booting worker with pid: 8907 2012-08-24 17:03:25 [8908] [INFO] Booting worker with pid: 8908 2012-08-24 17:03:25 [8909] [INFO] Booting worker with pid: 8909 2012-08-24 17:03:25 [8910] [INFO] Booting worker with pid: 8910 
+4
source share
1 answer

This is a very old post. But I had the same problem with installing NGinx + Gunicorn + Flask. I also had a 502 error with the same log as you have on every 300th request. Changing the type of working gunner to an asynchronous solution to the problem for me (I selected gthread). Hope this answer helps someone.

How to change the setting: http://docs.gunicorn.org/en/stable/settings.html#worker-class

How to choose your employee type: http://docs.gunicorn.org/en/latest/design.html#choosing-a-worker-type

And here is a good explanation why: How many simultaneous requests are processed by one Flask process?

0
source

All Articles