Blog Post

What Retailers Can Learn from Google and Amazon Outages

Google and Amazon experienced recent outages. Learn how retailers and ecommerce businesses can deal with outages that they can't control.

Two of the world’s largest tech companies, Google and Amazon, have experienced outages in the last couple of weeks. While these events are unrelated, there are certainly some actionable lessons-learned for retailers and ecommerce businesses.

Here’s what happened:

Amazon Outage

  • The outage occurred on July 16, Amazon Prime Day around 3pm due to traffic spikes.
  • Prime Day’s landing page didn’t work and directed users to error pages.
  • Amazon’s deal page (full of discounted items) and product search bar didn’t work.
  • There were also errors during the checkout process, like discounted items being marked back up once they were added to a user’s cart.
  • Homepage load times were as high as 10 seconds on desktop and up to 30 on mobile.
  • Amazon didn’t secure enough servers to handle the traffic surge – in response, they launched a backup page and cut off international traffic temporarily. To get more insight into how Amazon handled the crisis, like their 300-person crisis call, check out this CNBC post.

Google Outages

  • YouTube TV: Traffic surges caused a YouTube TV outage during the World Cup semifinal match between England and Croatia. There’s no public data on the cause of the YouTube TV outage.
  • Google Cloud: On July 17, Google Cloud Platform went down, affecting many sites with Google domains like Snapchat, Breitbart, and Spotify – as well as applications like Agile CRM and PokemonGo. The Cloud Platform went down because of poor IP configuration across their network of data centers. They tried cutting off a block of IP addresses, which backfired. They couldn’t return back to a good configuration because of a previously unspotted error.

What Does This Mean for Retailers?

YouTube TV and Amazon experienced outages because they couldn’t handle the traffic spikes. Google Cloud outage brought down lots of customers relying on their cloud services. These are great examples of how your own system can cause outage (Youtube TV and Amazon), and also how third parties, which you can’t control, can cause outage (Google Cloud).

Retailers need a plan of action for handling big events, like Black Friday, which bring large amounts of traffic. Preparation should focus not only on handling the increased traffic load on infrastructure and applications but should also include a plan on what to do when unpredictable failures occur. In other words, outage happened, now what do you do?

You don’t want users abandoning their carts or not even being able to load your pages. Let’s take a look at what you can do to get through a messy outage as gracefully as possible.

1. Scale and Money Help but They Won’t Eliminate Outages

Even Google and Amazon, the world’s largest tech companies, with tons of datacenters and abundant tech spend, fail during surges. No matter the size of your ecommerce site or organization, you need to plan for outage to occur.

We know you’re familiar with all you can do to prevent outage, but here’s how we recommend you plan for those occurrences you can’t avoid:

Know What Can Fail

Part of being prepared is knowing all the pieces in your digital delivery chain that might break. This includes network, servers, load balancers, applications, DNS, CDN, acceleration services, mobile optimization vendors, application code, DB, and other third-party vendors.

For example, when Google Cloud went down, many of their clients experienced outages – like Agile CRM, Snapchat, and PokemonGo. This outage didn’t just affect customers of Google Cloud (like Snapchat), it also affected the customer’s customers (like Snapchat’s users). As you might imagine, the customer’s customers weren’t happy:

snapchat users google cloud outage

Know what might fail so that you can monitor what needs monitoring and streamline your troubleshooting process. Identify your weaknesses and have a plan for what to do when your system fails.

Have One or More Backups

You know that you’re going to experience outage – so why not prepare your retail business for downtime?

Create a light site that you can switch to during application or code outages. For example, Amazon brought up a backup page that listed their top 10 items for sale. It was a much lighter version than the full site – but, 10 items for sale is better than an error page.

You should also have a nonfunctional site that you can switch over to during a total outage. When it comes to this nonfunctional site – it’s useless to direct users to an error page. Instead, set up a basic page that redirects them to something that works or create a page that lets them know you’re working on a solution.

Check out this error page example from Spotify, making a joke with Justin Bieber’s song, “Sorry.”

bieber spotify error page

Plan Your Internal Communication

Know who is responsible for what. Plan for quick and effective communication so that you don’t waste time lost in translation, looking for the right person, or twiddling your thumbs because you aren’t sure what to do or who to call.

It’s important to have a plan in place for outages within your control, like how Amazon held a 300-person conference call. But, it’s just as important to have a plan in place for communicating with third parties.

Communicate with Third Parties ASAP

When it comes to third parties, you need to focus on what you can control, and not what you can’t. You can control your communication with them: So, in the same way that you plan internal communication, plan your third-party communication.

Make sure your team is aware of which person is responsible for contacting vendors during an outage. Make sure responsible parties have the contact info they need for vendor reps.

Monitor third parties so that you can nip issues in the bud. You should also monitor third-party communications. For example, two years ago, we once noticed that Dyn was down via our own monitoring software well before they announced the outage on Twitter. We immediately stopped using the Dyn DNS service for our system and avoided a giant headache – that unfortunately, many other companies didn’t act quick enough to avoid. As a result, only the first wave of outage impacted our system and users.

2. Have a Public Communication Plan Ready

It didn’t take the media or general public long to express their opinions once Google and Amazon went down. If you can’t prevent outage, sometimes the best thing you can do is be transparent about what’s going on.

Google tweeted within the first hour that YouTube TV went down:

Pros of this YouTube TV tweet

  • They’re extremely apologetic – admitting that they know the outage sucks for fans.
  • They’re attempting to empathize with and validate users by recognizing that the timing is horrible.
  • They responded quickly – within the hour.

Amazon’s outage happened shortly after the start of Prime Day at 3pm ET. They tweeted around 2 hours later:

Pros of Amazon’s tweet

  • The first thing they mention is that they’re working on a resolution – which is what users really care about.
  • They attempt to lessen the blow by reminding users that there’s plenty of hours left to shop and many more deals coming.

Amazon also used clever redirects and pictures of “the dogs of Amazon” to try and decrease users’ frustration.

Users and the media are going to express their own opinions – so it’s best if you’re as transparent as possible as early on as possible. Don’t get caught up in the flurry of negativity and instead focus on what you can do:

  • Stay active on your social channels – keep the press and users in-the-know, be transparent, and inform them of the problem ASAP.
  • Redirect traffic to something that’s working. Amazon put up their light page where users could shop for the 10 most popular items.
  • Inform your marketing and PR teams of outages immediately. They’ll be the ones that get the message out to press contacts and on social media. So, work with Marketing/PR to make a customer communication plan for outages.
  • Use humor to ease tension, don’t get angry when customers get angry. Make sure your team isn’t taking frustrated customers personally. Users are going to get pissed, the only thing you can do is let them know you’re sorry and that you’re working on a solution.
  • Create a template for an apology tweet/social post in advance. It might look like this:

Sentence 1: State that we are working to solve the problem.

Sentence 2: Remind users of other features they can use.

Sentence 3: Lessen the blow with humor, a gift, or discount link.

3. Rebuild Trust and Give Your Customers Something Nice

If you’re going to set up error pages and redirects anyway, you might as well have something good to tell customers while you’re delivering the news about failure.

Customers understand stuff happens and they appreciate the token of something free. Every retailer has a promo code for those subscribing to their newsletter – so why not give a promo for failing the customer?

Plus, when you give a gift like that, you’re decreasing stress and increasing feel-good chemicals, primarily oxytocin. According to a study led by a Claremont Graduate University with Coupons.com, a $10 voucher increased oxytocin levels in recipients by 38%.

Here’s a great example of an error page that Bonobos set up where they gave users a “rain check.” The genius of this page is that it’s humorous, collects a user’s email, and gives a gift at the same time.

bonobos backup site

This example of using a backup site with minimal functionality – all this site does is collect a visitor’s email address.

Make sure your error page is set up like the Bonobos one – to help you retain new customers. If I’ve never purchased anything from your store, and if you’re having an outage when I visit the site, make sure you collect my contact information so that you don’t lose a future customer.

Instead of losing sales, your customers will appreciate the gesture and will return when your site is back up.

When YouTube TV went down, Google gave back by giving a free week’s worth of subscription to users who were watching the cup game.

Conclusion

Even tech giants aren’t immune to outage. The best thing any retailer can do is be prepared with a contingency plan.

When outages occur, stay transparent with the public. Let your users get some use out of your error pages: Give customers a gift, a coupon, or discount so that they don’t just abandon your site.

Map out your infrastructure, monitor it, and make sure you’re as prepared as possible for traffic surges.

Synthetic Monitoring
Network Reachability
DNS
CDN
SLA Management
Workforce Experience
eCommerce
Media and Entertainment
SaaS Application Monitoring
This is some text inside of a div block.

You might also like

Blog post

Traceroute InSession: A traceroute tool for modern networks

Blog post

The cost of inaction: A CIO’s primer on why investing in Internet Performance Monitoring can’t wait

Blog post

Mastering IPM: Key Takeaways from our Best Practices Series