This is a lot of questions !:-)
Q. Is this a safe approach?
At first glance, I would say so.
As a rule, creating a logo on a news site, where there is a lot of traffic and rapidly changing content, can be a problem.
A really good way to check is to create a single varnish box and give it direct access to your cluster (not through a load balancer) and provide it with a temporary public IP address. This will give you the opportunity to test VCL changes. You can test the comments, log in (if any) and everything else to make sure there are no surprises.
Q. Will Google still track properly, including repeat visitors?
Yes. Cookies are used only on the client side.
One thing you should watch is that when the backend sends cookies, Varnish also does not cache the contents. You will need to delete any cookies that are not required for vcl_fetch. This can be a problem if cookies are used to track the state of the user.
Q. Is there anything else that I need to watch for in my policies for phase1?
You will need to disable the cache rack in Rails and set your own headers. Keep in mind that when removing varnish, Rails will work without caching and will probably burn!
This is what I have in my production.rb:
# We do not use Rack::Cache but rely on Varnish instead config.middleware.delete Rack::Cache # varnish does not support etags or conditional gets # to the backend (which is this app) so remove them too config.middleware.delete Rack::ETag config.middleware.delete Rack::ConditionalGet
And in my application_controller, I have this private method:
def set_public_cache_control(duration) if current_user response.headers["Cache-Control"] = "max-age=0, private, must-revalidate" else expires_in duration, :public => true response.headers["Expires"] = CGI.rfc1123_date(Time.now + duration) end end
This is called in my other controllers, so I have very fine control over how much chacheing applies to different parts of the site. I use the installation method in each controller, which starts as before_filter:
def setup set_public_cache_control 10.minutes end
(Application_controller has a filter and an empty configuration method, so it can be optional in other controllers)
If you have a part of the site that does not require cookies, you can disable them based on the URL in VCL and apply the headers.
You can set the cache time for your static assets in your apache configurator like this (assuming you use the default default path):
<LocationMatch "^/assets/.*$"> Header unset ETag FileETag None # RFC says only cache for 1 year ExpiresActive On ExpiresDefault "access plus 1 year" Header append Cache-Control "public" </LocationMatch> <LocationMatch "^/favicon\.(ico|png)$"> Header unset ETag FileETag None ExpiresActive On ExpiresDefault "access plus 1 day" Header append Cache-Control "public" </LocationMatch> <LocationMatch "^/robots.txt$"> Header unset ETag FileETag None ExpiresActive On ExpiresDefault "access plus 1 hour" Header append Cache-Control "public" </LocationMatch>
These headers will be sent to your CDN, which will cache assets much longer. Watching the varnish, you will still see requests arriving at a decreasing speed.
I would also set up very short caching for all content where pages do not need cookies, but often change. In my case, I set the cache time to 10 seconds for the home page. What this means for varnish is that one user request will be sent to the backend every 10 seconds.
You should also consider using varnish to use gradient mode. This allows him to use slightly outdated content from the cache, preferring to expose visitors to a slow backend response for items that have just expired.
Q. There are plenty of archived articles that don't get updated, is it safe to cache them forever?
To do this, you will need to change your application to send different headings for those articles that are stored in the archive. This suggests that they will not have cookies. Based on what I do on my site, I would do it as follows: -
In the above setup, add a conditional value to change the cache time:
def setup # check if it is old. This code could be anything if news.last_updated_at < 1.months.ago set_public_cache_control 1.year else set_public_cache_control 10.minutes end end
This sets a public header, so Varnish will cache it (if there are no cookies), as well as any remote caches (on ISP or corporate gateways).
The problem is that you want to delete the story or update it (say, for legal reasons).
In this case, you must send Varnish to a private header in order to change the TTL for this URL, but send a shorter public header for everyone else.
This will allow you to configure Varnish to serve content (say) for 1 year, while it sends headers so that customers can return every 10 minutes.
You will need to add a mode to clean the varnish in these cases.
To get started, I have a second method in the application_controller application:
def set_private_cache_control(duration=5.seconds)
And in my vcl_fetch, I have this:
call set_varnish_ttl_from_header;
and the vcl function is as follows:
sub set_varnish_ttl_from_header { if (beresp.http.X-Varnish-TTL) { C{ char *x_end = 0; const char *x_hdr_val = VRT_GetHdr(sp, HDR_BERESP, "\016X-Varnish-TTL:"); if (x_hdr_val) { long x_cache_ttl = strtol(x_hdr_val, &x_end, 0); if (ERANGE != errno && x_end != x_hdr_val && x_cache_ttl >= 0 && x_cache_ttl < INT_MAX) { VRT_l_beresp_ttl(sp, (x_cache_ttl * 1)); } } }C remove beresp.http.X-Varnish-TTL; } }
Thus, the header is NOT passed (which s-max-age does) to any caching streams.
The installation method will look like this:
def setup # check if it is old. This code could be anything if news.last_updated_at < 1.months.ago set_public_cache_control 10.minutes set_private_cache_control 1.year else set_public_cache_control 10.minutes end end
Feel free to ask any further questions and I will update this answer!