Google Cloud suffers new outage in Europe
Google suffered a major cloud outage last week thanks to an errant peering advertisement by an unnamed European network carrier that caused its europe-west1 region to slip off the grid for about 70 minutes.
According to Google’s blog post on the matter, the network carrier mistakenly added a new peering link to Google that suggested it could handle far more traffic than it was actually capable of. As a result, a number of Internet regions became unreachable from Google Compute Engine’s europe1-west region.
“On this link, the peer’s network signaled that it could route traffic to many more destinations than Google engineers had anticipated, and more than the link had capacity for,” Google explained in the blog post. “Google’s network responded accordingly by routing a large volume of traffic to the link. At 11:55, the link saturated and began dropping the majority of its traffic.”
Google said that this kind of error is normally spotted by automated diagnostic checks. On this occasion though it slipped through the net because “the automation was not operational due to an unrelated failure, and the link was brought online manually, so the automation’s safety checks did not occur”.
To stop this kind of problem from occurring again, Google’s network engineers have taken steps to disallow manual link activation, the cloud giant added.
Similar such route announcement errors have occurred in the past, one of the most recent being last June, when Telekom Malaysia accidentally routed traffic from U.S. provider Level 3 Communications, LLC to a network that couldn’t handle the capacity and dropped all of its packets. As a result, thousands of Level 3 customers were left without internet access for a period of several hours.