On October 21st around 6pm Pacific time, our Azure CDN was gone! It came back a few hours later, but I started digging into a solution that would help mitigate this in the future (this post). Instead of the CDN serving content, it was serving 400’s (Bad Request). When something like this happens, we call it an outage, though that isn’t always the case. There are millions of dependencies in Technology, and if Just one breaks, it can have catastrophic down line damages. So our goal in Cloud and Scalable software is to allow for failure, understand where/when it can happen to the best of our ability, and make backup, fallback routines to handle that failure. These could be automated, or manual, but the more you have in place, the faster your app will come back online when a failure happens. Notice I said when, not if. Plan for failure sooner than later, and you’ll be better prepared.
Technical folks, can skip to Technical Problem and Solution
What could make our CDN just go away? More on that later. How to solve the problem. We put our thinking caps on and …. come up with a solution.