Sunday, February 24, 2013

When Clouds Go Thump

Almost a year ago Microsoft's Azure product had a global service failure.

It's happened again.

On the evening of February 22, 2013 Microsoft's Azure Service Dashboard had the following notice:
Storage is currently experiencing a worldwide outage impacting HTTPS operations (SSL traffic) due to an expired certificate. HTTP traffic is not impacted. We are validating the recovery options before implementing them. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.
The color is Microsoft's.

Most of the time when I went to the Azure Service Dashboard this is what I got.

Here's what triggered it.

Source: Windows Azure Forum on MSDN

MJFara, one of the posters (identified as a "Partner"), hit the nail on the head.

This is unacceptable, I'm supposed to release an enterprise app on this platform?
Imagine how many phone calls I would have gotten by now from very angry customers.
The Register's summary:
It is the opinion of The Register that to have a core service fail in every data center across the world simultaneously is an extremely bad thing to happen to a cloud provider.
From my posting last year:
What I want to highlight is that cloud providers are not immune from service failures. They are likely capable of providing more redundant and resilient services than many organizations can provide.
Unfortunately this failure demonstrates that Microsoft doesn't seem to be "capable of providing more redundant and resilient services."

Go back and reread MJFara's comments. Think about it.

No comments: