This post was published as a bylined article by Mehdi Daoudi on Forbes.com.
There’s no doubting the growth the cloud has experienced over the course of just a few years. Evolving from basic browser-based applications and a simple IaaS deployment method, the cloud has undoubtedly reshaped today’s IT. Most modern websites and back-end technologies rely on third parties that reside on the cloud because it’s easier and less expensive to deploy.
Though the cloud has afforded more flexibility in this aspect, our dependence and trust in their ability to provide the stable, reliable, and scalable performance that a good end user experience requires is risky. In just the past year, we’ve seen several high-profile cloud outages that have made CTOs and CIOs of major companies think twice about their level of dependence on these services.
Some of the biggest outages of this year include:
- Black Friday weekend 2016: Last year’s peak shopping period avoided major, lengthy outages among online retailers, but a number of sites did experience a series of brief outages. While not headline-grabbing, these regional or partial-site glitches or slowdowns still hurt businesses. One large retailer had very long response times in a few cities. What was notable was that most of these problems were due to third-party, cloud-based services that could not keep up with the heavy traffic loads.
- Amazon’s popular S3 cloud storage service is used by many thousands of companies. On February 28, its U.S. East region experienced widespread S3 problems that lasted for over three hours. The S3 service is so popular that chances are your company was affected. Since that day, a number of smaller S3 glitches have occurred, with frustration spilling over to complaints in the Twittersphere.
- It was a tough June for IBM’s cloud business when the CEO of Lululemon publicly blamed it for the retailer’s 20-hour site outage. This type of C-level public admonishment is rare but so was the extraordinary outage length for this major brand.
- Shortly thereafter, Facebook announced it was removing its WhatsApp service from IBM’s cloud and placing it within its own data center.
- Dyn DNS attack: While the cause was a DDoS attack when this critical internet domain name mapping service went down a year ago, it took many sites with it. DNS is an easy item to forget, as you just expect it to work. But this outage showed it’s another external element you must watch.
Below are five steps you can take to reduce your risk of experiencing the effect of an outage:
- Monitor every element of your IT infrastructure as if it was within your own firewall.
- Make sure you build in redundancy for when a third party fails.
- Limit the number of externally sourced elements in your systems.
- Spread your cloud-based infrastructure into different regions
- Focus on actionable data
There is no such thing as infallible technology, and that applies to the cloud as well. Always assume that something will break eventually and have a plan prepared for when it does so you can get back up quickly and protect your end user experience.