Today, we have a very serious unplanned downtime for our Azure application for what is now arriving up to 9 hours. We reported support for Azure, and the ops team is actively trying to solve the problem, and I have no doubt about it. We managed to run our application on another โtestโ hosting service, and we redirected our CNAME to point to the instance so that our customers were satisfied, but the โmainโ hosted service is still unavailable.
My own โfinger in the airโ instinct is that the problem is with the network in our data center (Western Europe), and, indeed, later that day, when the service panel panel disappeared for this region, with the message this effect . (Our application shows as โHealthyโ on the portal, but is not accessible through our cloudapp.net URL. In addition, threads in our application register SQL connection exceptions in our storage account, because they cannot contact the database)
However, it is very strange that the "test" instance that I mentioned above is also located in the same data center and does not have problems with contacting the database, and its external endpoint is fully accessible.
I would like to ask the community if there is anything that I could do better to avoid this downtime? I submitted to the leadership regarding the presence of at least 2 roles per role, but still I burned out. Should I go to a more reliable data center? Should I deploy my application to multiple data centers? How can I control that my SQL-Azure database is in the same data center?
Any constructive guidance would be appreciated - as a technician, I never had a more unpleasant day that could do nothing to help solve the problem.
source share