On February 29, 2012 Microsoft's Azure product had a global service failure. Microsoft's Azure Service Dashboard was overwhelmed and pretty much unavailable for hours on end. When it was accessible here is one of the updates approximately 8 hours into the outage:
The restoration steps to mitigate the issue are still underway. This incident impacts Access Control 2.0, Marketplace, Service Bus and the Access Control & Caching Portal in the same regions where Windows Azure Compute is impacted. As a result affected customers may experience a loss of application functionality. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.This is not a Microsoft bashing. In April, 2011 Amazon also suffered a broad failure.
What I want to highlight is that cloud providers are not immune from service failures. They are likely capable of providing more redundant and resilient services than many organizations can provide.
My question is as a CIO are you comfortable with this level of opacity to service failures? Are you willing to answer your users and executives with one-way information flow from a (sometimes available) web page? Would a 10 day service credit make your CEO happy?