One of the most annoying things about using GAE for a brand new application is dealing with instances that were released if no one got to your servers in 15 minutes. Because the application is new or just has few users, there will be periods of high delay for some users who have no idea that the instances are βdeployedβ
As far as I understand, you have these options based on docs :
Use manual-scaling and set the number of instances to 1 .
When you use manual-scaling , any number of instances that you have installed is what you will have - no more, no less. This is clearly inefficient, since you can pay for unused instances, and instances are not automatically added / removed when traffic increases / decreases
Use basic-scaling and set the idle-timeout about 24 hours or 48 hours.
This will cause your instance to work until someone requests your API at least once during this period of time.
Use automatic-scaling with min-idle-instances and enable warm-ups.
This does not work properly. According to these docs :
If your application does not serve traffic, the first request to the application should always be a download request, not a warm-up request.
This does not solve our problem, because if zero instances are running, then there is nothing to warm up at first. Thus, you still get latency on first request.
The desired effect that I would like to have is to always have an instance, and then expand from there if the traffic increases (and, of course, scales, but never drops below one instance). It will be like auto-scaling, but with 1 instance that always works.
Is this possible in GAE? Or am I missing something?
Currently my temporary solution is to install my application in manual-scaling with 1 instance, so at least my application can be used for new users.