Google’s email service crumbled yesterday for about 40 minutes, leaving millions of enterprise and consumer users without access to their cloud-stored email.
Gmail didn’t fall down due to a denial-of-service attack as was reported initially yesterday (which was quickly amended), despite no initial evidence to suggest that it was. The search giant said on its dashboard status pages: “Although our engineering team is still fully engaged on investigation, we are confident we have established the root cause of the event and corrected it.”
At the same time millions of Google Chrome browsers crashed at around the same time. In some cases, Chrome crashed multiple times times within a short period. (It happened to me. Chrome crashed about three times in the space of 20 minutes, annoyingly, as I was — ironically — writing about the Gmail outage and Chrome crashes.)
However, in spite of Google Chrome’s sandboxing feature, which allows each tab and process to run in a separate thread to prevent the browser from fully crashing if a plug-in or bad bit of Web site code causes issues, the entire browser crashed, losing any unsaved work at the same time.
Google engineer Tim Steele took to the firm’s developer forums to confirm that, in spite of the apparent link between Gmail’s outage and Chrome crashes, it was Google Sync that was causing the browser to crash worldwide, which ultimately then had a knock-on effect to other Google services, not limited to Gmail, Google Docs, Drive and Apps.
Google Sync keeps a user’s Chrome browser in sync when they log in to their browser. Bookmarks, extensions, apps and settings are transferred across to the new Chrome browser on another machine when a user logs in.
But this back-end service’s failure had a knock-on effect to Chrome browsers. (Presumably, browsers that aren’t set up to synchronize settings were not affected). Steele noted that Google’s Sync Server relies on a component to enforce quotas on per-datatype sync traffic, which failed. The quota service “experienced traffic problems today due to a faulty load balancing configuration change.”
He added: “That change was to a core piece of infrastructure that many services at Google depend on. This means other services may have been affected at the same time, leading to the confounding original title of this bug.”
As a result, Google’s Sync Server “reacted too conservatively” by telling the Chrome browser to “throttle ‘all’ data types,” without taking into account for the fact that the browser doesn’t support all these data types. This caused Chrome to crash en masse around the world.
The ‘too-long, didn’t-read’ version is that Google changed something, it didn’t work, and it caused the crashes. No hackers were involved, and the outage and crashes certainly were not a result of a denial-of-service attack.