Google Inc (GOOG) explains reason for outage

Posted Jan 25, 2014

Google Inc (NASDAQ:GOOG) went offline for around 25 minutes. The outage affected users on Gmail, Google+, Google Calendar, and Google Docs. For around 10% of Google’s users, the outage persisted for around 30 minutes. Google apologized for the issue and said that they are focused on the bug that caused the outage.

“At 10:55 a.m. PST this morning, an internal system that generates configurations?essentially, information that tells other systems how to behave?encountered a software bug and generated an incorrect configuration. The incorrect configuration was sent to live services over the next 15 minutes, caused users? requests for their data to be ignored, and those services, in turn, generated errors. Users began seeing these errors on affected services at 11:02 a.m., and at that time our internal monitoring alerted Google?s Site Reliability Team. Engineers were still debugging 12 minutes later when the same system, having automatically cleared the original error, generated a new correct configuration at 11:14 a.m. and began sending it; errors subsided rapidly starting at this time. By 11:30 a.m. the correct configuration was live everywhere and almost all users? service was restored,” said the company in a blog post.

To ensure the avoidance of an outage in the future, Google is adding additional input validation checks for configurations so that bad configurations generated in the future does not result in service disruption. They are also adding additional targeted monitoring to quickly detect and diagnose the cause of service failure.