We currently use CloudFront at many border locations to serve product images (about half a million) that are dynamically resized in different sizes. Our Cloudfront distribution uses the EC2 php script source code to extract the source image from S3, dynamically converts it based on the specified request criteria (width, height, cropping, etc.) and stream back to Cloudfront, which caches it at the extreme location.
However, a website visitor uploads an uncached image when the first time gets into this rather difficult conversion.
We would like to be able to “pre-cache” our images (using a batch job requesting each image URL) so that end users are not the first to hit an image of a certain size, etc.
Unfortunately, since images are only cached at the Edge Location assigned to the pre-caching service, site visitors using a different Edge location will not receive the cached image and will suffer from heavy script resizing on the source server.
The only solution we came up with where each Edge Location can get an image within a reasonable load time is this:
We have a Cloudfront distribution that points to the source of the EC2 php script. But instead of doing the image conversion described above, the start of the script forwards the request and querystring parameters to another Cloudfront distribution. This distribution has a source EC2 PHP script that performs image conversion. Thus, the image is always cached at the Edge location near our EC2 instance (Ireland), which avoids another transformation when the image is requested from another Edge location.
So, for example, a request in Sweden: hit / image / stream / id / 12345, which does not have caching, so it sends a request to the source, which is an EC2 machine in Ireland. Then, the EC2 "streaming" page is loaded, loading / image / size / id / 12345 from another Cloudfront distribution that ends up in the Irish Edge location, which is also not cached. Then it sends a request to the source, again the same EC2 computer, but to the "size" page, which performs the resizing. After that, both at Edge Location in Sweden and Ireland, the image is cached.
Now a request from France is requesting the same image. The French border location does not have caching, so it calls the source, which is the EC2 machine in Ireland, which names the second CF distribution, which again falls into the Irish edge location. This time it has image caching and can immediately return it. Now the French Edge Location also has image caching, but without having to call the page "resize" - just a "streaming" page with a cached image in Ireland.
It also means that our “pre-cached” batch service in Ireland may request Edge Location from the Irish region and pre-cache images before they are requested by our site visitors.
We know that this looks a little absurd, but with the desire we have that the end user should never wait long for the image to be resized, this seems like the only tangible solution.
Have we missed another / better solution? Any comments on the above?