How to deal with cookies on a varnish stack

Due to the poor performance of the site, I started looking for information about varnishing as a caching solution and you have some questions about Google Analytics.

When there are 5K active users on the site (according to the GA real traffic report), the server is uploaded to server servers up to 30-40 +, the initial stage of downloading packages and the site are almost unusable. I know slow queries and working with databases that require better performance, but at the moment I don’t have the resources to optimize queries and db schemas, indexes, etc. Therefore, you can add varnish.

I created a diagram to better display the stack, this is what the current stack looks like: (the site is currently caching images / css / js in CDN - Akamai)

enter image description here

I would like to add two instances of the varnish on the front servers for caching articles, and the stack will look like this:

enter image description here

The site is a news site, and I'm looking for advice on how to handle cookies and cache correctly. For the first phase, I would just like to completely exclude authenticated users and serve dynamic content, since there are not many registered users at the same time.

Confusion with Google Analytics Cookies. As far as I understand, Google sets a cookie on the client using javascript, and the client communicates directly with Google, so the backend does not need GA cookies sent by the client, and it’s safe to disable them during the vcl_recv routine.

sub vcl_recv { // Remove has_js and Google Analytics __* cookies. set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js)=[^;]*", ""); // Remove a ";" prefix, if present. set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", ""); } 

Questions

  • Is this a safe approach?
  • Will Google track correctly, including repeat visitors?
  • Is there anything else I need to view in my policies for phase 1?

Since the default varnish does NOT cache anything that has a set of cookies, is it safe to implement the stack described above by simply adding the GA cookie delete policy? I understand that without fine-tuning VCL policies, I do not get a high rating, but during my tests it turned out that even with the default varnish on the front server, which has a 30% rating, and after analysis I see most of them - js files / css and images, so some of these static files are not served by Akamai or even Apache, instead they are passed to Passenger / Rails to serve the static file. This certainly needs to be fixed.

  • Will varnish improve performance with the standard?

I am new to varnish, so any additional details / recommendations on varnish or the stack that I suggested are greatly appreciated.

For phase 2 +

Since the content update is being updated, I plan to perform a cleanup on both the varnish servers launched by the backend servers when changing, such as user comments, page views, etc.

There are many archived articles that are not updated, is it safe to cache them forever?

Since I plan to use RAM for storing varnishes, should I have an additional (third) varnish and use a disk for storage, obviously for these archive pages. Perhaps adding a nginx stack in front of the varnish servers to direct traffic to a specific varnish instance for archived content? Load balancer β†’ A pair of reverse Nginx proxies> A pair of varnishes β†’ (varnished LBs up to 8 server servers)

I also appreciate any advice on architecture. If you need further information in order to provide a better consultation, please let me know and I will be happy to provide you with more detailed information.

+6
source share
1 answer

This is a lot of questions !:-)

 Q. Is this a safe approach? 

At first glance, I would say so.

As a rule, creating a logo on a news site, where there is a lot of traffic and rapidly changing content, can be a problem.

A really good way to check is to create a single varnish box and give it direct access to your cluster (not through a load balancer) and provide it with a temporary public IP address. This will give you the opportunity to test VCL changes. You can test the comments, log in (if any) and everything else to make sure there are no surprises.

 Q. Will Google still track properly, including repeat visitors? 

Yes. Cookies are used only on the client side.

One thing you should watch is that when the backend sends cookies, Varnish also does not cache the contents. You will need to delete any cookies that are not required for vcl_fetch. This can be a problem if cookies are used to track the state of the user.

 Q. Is there anything else that I need to watch for in my policies for phase1? 

You will need to disable the cache rack in Rails and set your own headers. Keep in mind that when removing varnish, Rails will work without caching and will probably burn!

This is what I have in my production.rb:

  # We do not use Rack::Cache but rely on Varnish instead config.middleware.delete Rack::Cache # varnish does not support etags or conditional gets # to the backend (which is this app) so remove them too config.middleware.delete Rack::ETag config.middleware.delete Rack::ConditionalGet 

And in my application_controller, I have this private method:

 def set_public_cache_control(duration) if current_user response.headers["Cache-Control"] = "max-age=0, private, must-revalidate" else expires_in duration, :public => true response.headers["Expires"] = CGI.rfc1123_date(Time.now + duration) end end 

This is called in my other controllers, so I have very fine control over how much chacheing applies to different parts of the site. I use the installation method in each controller, which starts as before_filter:

 def setup set_public_cache_control 10.minutes end 

(Application_controller has a filter and an empty configuration method, so it can be optional in other controllers)

If you have a part of the site that does not require cookies, you can disable them based on the URL in VCL and apply the headers.

You can set the cache time for your static assets in your apache configurator like this (assuming you use the default default path):

 <LocationMatch "^/assets/.*$"> Header unset ETag FileETag None # RFC says only cache for 1 year ExpiresActive On ExpiresDefault "access plus 1 year" Header append Cache-Control "public" </LocationMatch> <LocationMatch "^/favicon\.(ico|png)$"> Header unset ETag FileETag None ExpiresActive On ExpiresDefault "access plus 1 day" Header append Cache-Control "public" </LocationMatch> <LocationMatch "^/robots.txt$"> Header unset ETag FileETag None ExpiresActive On ExpiresDefault "access plus 1 hour" Header append Cache-Control "public" </LocationMatch> 

These headers will be sent to your CDN, which will cache assets much longer. Watching the varnish, you will still see requests arriving at a decreasing speed.

I would also set up very short caching for all content where pages do not need cookies, but often change. In my case, I set the cache time to 10 seconds for the home page. What this means for varnish is that one user request will be sent to the backend every 10 seconds.

You should also consider using varnish to use gradient mode. This allows him to use slightly outdated content from the cache, preferring to expose visitors to a slow backend response for items that have just expired.

 Q. There are plenty of archived articles that don't get updated, is it safe to cache them forever? 

To do this, you will need to change your application to send different headings for those articles that are stored in the archive. This suggests that they will not have cookies. Based on what I do on my site, I would do it as follows: -

In the above setup, add a conditional value to change the cache time:

 def setup # check if it is old. This code could be anything if news.last_updated_at < 1.months.ago set_public_cache_control 1.year else set_public_cache_control 10.minutes end end 

This sets a public header, so Varnish will cache it (if there are no cookies), as well as any remote caches (on ISP or corporate gateways).

The problem is that you want to delete the story or update it (say, for legal reasons).

In this case, you must send Varnish to a private header in order to change the TTL for this URL, but send a shorter public header for everyone else.

This will allow you to configure Varnish to serve content (say) for 1 year, while it sends headers so that customers can return every 10 minutes.

You will need to add a mode to clean the varnish in these cases.

To get started, I have a second method in the application_controller application:

 def set_private_cache_control(duration=5.seconds) # logged in users never have cached content so no TTL allowed if ! current_user # This header MUST be a string or the app will crash if duration response.headers["X-Varnish-TTL"] = duration.to_s end end end 

And in my vcl_fetch, I have this:

 call set_varnish_ttl_from_header; 

and the vcl function is as follows:

 sub set_varnish_ttl_from_header { if (beresp.http.X-Varnish-TTL) { C{ char *x_end = 0; const char *x_hdr_val = VRT_GetHdr(sp, HDR_BERESP, "\016X-Varnish-TTL:"); /* "\016" is length of header plus colon in octal */ if (x_hdr_val) { long x_cache_ttl = strtol(x_hdr_val, &x_end, 0); if (ERANGE != errno && x_end != x_hdr_val && x_cache_ttl >= 0 && x_cache_ttl < INT_MAX) { VRT_l_beresp_ttl(sp, (x_cache_ttl * 1)); } } }C remove beresp.http.X-Varnish-TTL; } } 

Thus, the header is NOT passed (which s-max-age does) to any caching streams.

The installation method will look like this:

 def setup # check if it is old. This code could be anything if news.last_updated_at < 1.months.ago set_public_cache_control 10.minutes set_private_cache_control 1.year else set_public_cache_control 10.minutes end end 

Feel free to ask any further questions and I will update this answer!

+11
source

All Articles