Google’s Gmail service went down for about 20 minutes on Monday. That was annoying, but not exactly unprecedented. These sorts of outages happen all the time. What was strange is that the Gmail outage coincided with widespread reports that Google’s Chrome browser was also crashing.
Late Monday, Google engineer Tim Steele confirmed what developers had been suspecting. He said that the crashes were affecting Chrome users who were using another Google web service known as Sync, and that Sync and other Google services — presumably Gmail too — were clobbered Monday when Google misconfigured its load-balancing servers.
Sync is essentially Google’s answer to Apple’s iCloud. It’s a software service built by Google to unshackle web surfers from their own desktops. It works in the background, shuttling information between the Chrome browser and Google’s servers, so that people users who log into Google can get at their bookmarks, extensions, and apps — no matter what computer they’re using to surf the web.
But on Monday, Steele wrote in a developer discussion forum, a problem with Google’s Sync servers kicked off an error on the browser, which made Chrome abruptly shut down on the desktop.
“It’s due to a backend service that sync servers depend on becoming overwhelmed, and sync servers responding to that by telling all clients to throttle all data types,” Steele said. That “throttling” messed up things in the browser, causing it to crash.
The problems were short-lived, but widespread. Over at Hacker News — a news discussion site that tends to attract Silicon Valley’s most knowledgeable software developers — a long thread quickly filled up with dozens of crash reports. “My Chrome has been crashing every ten minutes for the last half hour,” wrote one poster.
This may be a first. Bad webpage coding can often cause a browser to crash, but yesterday’s crash looks like something different: widespread crashing kicked off by a web service designed to help drive your browser.
Think of it as the flip side of cloud computing. Google’s pitch has always been that its servers are easier to use and less error-prone than buggy desktop software. But the Sync problem shows that when Google goes down, it can not only keep you from getting your e-mail — it can knock desktop software such as a browser offline too.
Chrome prides itself on “sandboxing” itself, so that a problem with a single webpage can only crash a tab in the browser, and not bring down the entire program. But that’s just what happened with Monday’s bug. It clobbered the entire browser.
“That’s definitely a big and unusual problem because if the browser shuts down, that’s a failure of the whole model of Chromium itself,’ says Kevin Quennesson, CTO of online photo service Everpix.
“When you bridge authentication and identity and the cloud to a desktop application, you then get occasionally these very weird failures,” says David Ulevicth, the founder of OpenDNS, a cloud-based infrastructure services company.
It’s the kind of issue that could pop up more often as developers work to build browsers such as Rockmelt that do more than simply surf the web, says Michael Mahemoff, a former Google Chrome team member who is now the founder of podcast app-maker Player FM. “People are trying to integrate more identity and these kind of sync service and social services,” he says.
It’s also something that cloud service providers are going to have to worry about more and more, as services such as Apple’s iCloud and Windows Live get more closely intertwined with our phones and PCs.
“As you centralize things like authentication and identity to one provider, then when that one provider has a hiccup the impact can be far-reaching,” says Ulevicth. “Imagine a scenario where you can’t even open your Android phone or you can’t get phone calls on Google Voice. it’s not just your browser.”
Image: Robert McMillan/Wired
Article source: http://www.wired.com/wiredenterprise/2012/12/google-bug/
Google’s email service crumbled yesterday for about 40 minutes, leaving millions of enterprise and consumer users without access to their cloud-stored email.
Gmail didn’t fall down due to a denial-of-service attack as was reported initially yesterday (which was quickly amended), despite no initial evidence to suggest that it was. The search giant said on its dashboard status pages: “Although our engineering team is still fully engaged on investigation, we are confident we have established the root cause of the event and corrected it.”
At the same time millions of Google Chrome browsers crashed at around the same time. In some cases, Chrome crashed multiple times times within a short period. (It happened to me. Chrome crashed about three times in the space of 20 minutes, annoyingly, as I was — ironically — writing about the Gmail outage and Chrome crashes.)
However, in spite of Google Chrome’s sandboxing feature, which allows each tab and process to run in a separate thread to prevent the browser from fully crashing if a plug-in or bad bit of Web site code causes issues, the entire browser crashed, losing any unsaved work at the same time.
Google engineer Tim Steele took to the firm’s developer forums to confirm that, in spite of the apparent link between Gmail’s outage and Chrome crashes, it was Google Sync that was causing the browser to crash worldwide, which ultimately then had a knock-on effect to other Google services, not limited to Gmail, Google Docs, Drive and Apps.
Google Sync keeps a user’s Chrome browser in sync when they log in to their browser. Bookmarks, extensions, apps and settings are transferred across to the new Chrome browser on another machine when a user logs in.
But this back-end service’s failure had a knock-on effect to Chrome browsers. (Presumably, browsers that aren’t set up to synchronize settings were not affected). Steele noted that Google’s Sync Server relies on a component to enforce quotas on per-datatype sync traffic, which failed. The quota service “experienced traffic problems today due to a faulty load balancing configuration change.”
He added: “That change was to a core piece of infrastructure that many services at Google depend on. This means other services may have been affected at the same time, leading to the confounding original title of this bug.”
As a result, Google’s Sync Server “reacted too conservatively” by telling the Chrome browser to “throttle ‘all’ data types,” without taking into account for the fact that the browser doesn’t support all these data types. This caused Chrome to crash en masse around the world.
The ‘too-long, didn’t-read’ version is that Google changed something, it didn’t work, and it caused the crashes. No hackers were involved, and the outage and crashes certainly were not a result of a denial-of-service attack.