Bad gateway errors when loading nginx + Unicorn (Rails 3 application)

I have a Rails application (3.2) that runs on nginx and the unicorn on the cloud platform. Box works on Ubuntu 12.04.

When the system load is about 70% or higher, nginx suddenly (and seemingly by accident) starts throwing 502 "bad gateway" errors ; when the load is less, nothing like that. I experimented with a different number of cores (4, 6, 10 - I can "change equipment" like on a cloud platform), and the situation is always the same. (CPU usage is similar to system loading, user area is 55%, the rest of the system is stolen, with lots of free memory, without sharing).

502 usually come in batches, but not always.

(I run one unicorn worker per core and one or two nginx workers. See the relevant sections of the configurations below for 10 cores.)

I really don't know how to track the cause of these errors. I suspect this may have something to do with unicorn workers who can't serve (on time?), But it looks weird because they don't seem to saturate the CPU and I see no reason why they will wait for input - output (but I don’t know, I don’t know how to do this).

Could you help me find the reason?


Unicorn Configuration ( unicorn.rb ):

 worker_processes 10 working_directory "/var/www/app/current" listen "/var/www/app/current/tmp/sockets/unicorn.sock", :backlog => 64 listen 2007, :tcp_nopush => true timeout 90 pid "/var/www/app/current/tmp/pids/unicorn.pid" stderr_path "/var/www/app/shared/log/unicorn.stderr.log" stdout_path "/var/www/app/shared/log/unicorn.stdout.log" preload_app true GC.respond_to?(:copy_on_write_friendly=) and GC.copy_on_write_friendly = true check_client_connection false before_fork do |server, worker| ... I believe the stuff here is irrelevant ... end after_fork do |server, worker| ... I believe the stuff here is irrelevant ... end 

And config ngnix:

/etc/nginx/nginx.conf :

 worker_processes 2; worker_rlimit_nofile 2048; user www-data www-admin; pid /var/run/nginx.pid; error_log /var/log/nginx/nginx.error.log info; events { worker_connections 2048; accept_mutex on; # "on" if nginx worker_processes > 1 use epoll; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; # optimialization efforts client_max_body_size 2m; client_body_buffer_size 128k; client_header_buffer_size 4k; large_client_header_buffers 10 4k; # one for each core or one for each unicorn worker? client_body_temp_path /tmp/nginx/client_body_temp; include /etc/nginx/conf.d/*.conf; } 

/etc/nginx/conf.d/app.conf :

 sendfile on; tcp_nopush on; tcp_nodelay off; gzip on; gzip_http_version 1.0; gzip_proxied any; gzip_min_length 500; gzip_disable "MSIE [1-6]\."; gzip_types text/plain text/css text/javascript application/x-javascript; upstream app_server { # fail_timeout=0 means we always retry an upstream even if it failed # to return a good HTTP response (in case the Unicorn master nukes a # single worker for timing out). server unix:/var/www/app/current/tmp/sockets/unicorn.sock fail_timeout=0; } server { listen 80 default deferred; server_name _; client_max_body_size 1G; keepalive_timeout 5; root /var/www/app/current/public; location ~ "^/assets/.*" { ... } # Prefer to serve static files directly from nginx to avoid unnecessary # data copies from the application server. try_files $uri/index.html $uri.html $uri @app; location @app { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_redirect off; proxy_pass http://app_server; proxy_connect_timeout 90; proxy_send_timeout 90; proxy_read_timeout 90; proxy_buffer_size 128k; proxy_buffers 10 256k; # one per core or one per unicorn worker? proxy_busy_buffers_size 256k; proxy_temp_file_write_size 256k; proxy_max_temp_file_size 512k; proxy_temp_path /mnt/data/tmp/nginx/proxy_temp; open_file_cache max=1000 inactive=20s; open_file_cache_valid 30s; open_file_cache_min_uses 2; open_file_cache_errors on; } } 
+7
source share
1 answer

After searching the expressions found in the nginx error log, it was a known problem that had nothing to do with nginx, did little to the unicorn, and was rooted in the OS settings (linux).

The core of the problem is that the socket lag is too short. There are various considerations about how necessary this is (whether you want to detect a failure of an ASAP cluster member or leave the application by pressing load restrictions). But in any case, listen :backlog needs to be configured.

I found that in my case a listen ... :backlog => 2048 was enough. (I have not experimented much, although there is a good hack to do this if you like having two sockets for communication between nginx and a unicorn with different backlogs and a longer backup time, and then look in the nginx log, how often is a shorter queue not performed.) Please note that this is not the result of scientific calculation and YMMV.

Note, however, that many OS-es (most linux distributions, including Ubuntu 12.04) have significantly lower default OS level limits for socket storage sizes (up to 128).

You can change the limits of the OS as follows (as root):

 sysctl -w net.core.somaxconn=2048 sysctl -w net.core.netdev_max_backlog=2048 

Add them to /etc/sysctl.conf so that the changes are permanent. ( /etc/sysctl.conf can be reloaded without rebooting with sysctl -p .)

There are references to the fact that you may need to increase the maximum number of files that can also be opened by a process (use the ulimit -n constant and /etc/security/limits.conf ). I already did this for other reasons, so I can’t say whether it matters or not.

+21
source

All Articles