Sunday, April 14, 2019

Outage Communication

This post isn't bashing cloud providers, although that's an easy target.

What this is about is to give some examples of outage communication from various providers. And yes, Google and Facebook are in different sectors but the wide differences in their outage communications are still interesting.

On March 12, 2019, Google suffered an outage that impacted Gmail and a variety of their services that depended on their file system. Over the next several hours they posted 3 updates on their G Suite Status Dashboard. The first noted that they were having an outage. The second update was posted in under 2 hours and stated that they were continuing to investigate. It also enumerated the services that were impacted. The final update was 2 and 1/2 hours later and said that the issue was resolved.


But Google didn't stop there. 2 days later they posted a thorough postmortem (archive.is) that identified a root cause and remediation and prevention.

That's the way to communicate.

On March 13, 2019 Facebook had a 14-hour outage which took down the Facebook social media service, its Messenger and WhatsApp apps, Instagram, and Oculus.

Here's Facebook's communication on that outage.


Yes, that's it.

Which of these would you prefer from your services provider? Ask about that before you sign a contract and consider putting a requirement for communication and follow-up in the contract.


No comments: