Blog Post

Is your cloud provider telling you everything, everywhere, all at once?

Updated

Published

August 15, 2023

mins read

Mark Towler

in this blog post

Heading 2

Today the Internet IS the new enterprise network your organization relies upon. However, most of your key applications and systems are outsourced to the cloud. In fact, huge parts of your Internet Stack are either outsourced to the cloud or to 3rd-parties who themselves rely upon the cloud. And that's an issue because if any of those cloud-based services go down, your network is going to be impacted. Fortunately, cloud providers are focused on maintaining uptime and have status pages where they quickly post outage information.

Or do they?

Jeff Martens of Metrist recently wrote an interesting blog where he analyzed the average time it takes for a SaaS status page to be updated during an outage - and as he puts it, "the results aren't pretty." According to Metrist's research the average delay between an outage occurring and a status page being updated is 29 minutes. There are a lot of reasons behind this (Jeff outlines in them in his blog post, it's well worth a read) but what it really comes down to are the same things every organization faces with an outage. What's gone wrong? Whose responsibility is it to fix it? What needs to be done to fix it? And what was the impact on our users? To this we can add one additional factor for delay: does this meet the cloud provider's threshold for reporting an outage?

Every cloud provider has their own process and requirements for reporting an outage on their status page. Which makes perfect sense: they're reporting an error that can impact their business. They are obligated to be careful purely for business reasons. However, not only does this exacerbate the delay, it also means that they may not post anything at all:

"Interestingly, the majority of outages detected by Metrist never appear on the third-party cloud vendor’s status page." - Jeff Martens, Metrist

Note that we're only talking about outages here - performance degradations and slowdowns are extremely unlikely to ever be noted on a cloud provider's status page. Yet those can severely impact your SaaS applications, systems and users. Considering that we've already determined that the average cost of an outage is $6700/minute, a 29-minute delay can be incredibly expensive. And that delay is likely going to be even longer if you're relying upon a cloud provider to inform you when something's down.

Which is why you should never rely on someone else to tell you when there's an issue with a cloud-based service you're providing. Your users are going to blame you when there's a problem, not your cloud provider - especially when that provider hasn't acknowledged (and may never acknowledge) the issue in the first place. Even if they do admit fault at a later date, that's not going to mitigate the immediate impacts of a poor digital experience for your users.

And bear in mind that your cloud providers aren't deliberately lying to you about any of this! They have specific requirements their IT teams need to comply with, detailing what they post and when. More to the point, they may not even be aware of an outage or slowdown until they start hearing customer complaints. They can't even begin to fix the problem until they know about it and if the problem is a slowdown or performance degradation, they may just hope it didn't impact anyone. They may not even be able to determine who, if anyone, was impacted.

All of which means you should be using an an Internet Performance Monitoring solution that covers your entire Internet Stack. This will not only let you know whenever one of the services you rely upon has an issue, it should ideally provide several other advantages:

Independent monitoring from the public cloud so you don't lose visibility when the cloud goes down
Monitoring where it matters so you see issues from your workers' perspective no matter where they are
Real-time data so your response isn't further delayed while you wait for data sources to refresh
Granular, non-aggregated data that can be retained for years, because the devil is usually in the details and you may need to compare archived data to determine trends or changes from original baselines
Alerts you can trust with stateless nodes that provide valid data and eliminate time-wasting false positives

Catchpoint can help you mitigate the cost of outages with all the capabilities described above.

Catchpoint is the Internet Resilience Company™, and our Internet Performance Monitoring (IPM) platform provides observability into every part of the Internet Stack. This means that when there are issues with any system – even 3rd-party ones you don’t control – you can identify them quickly and resolve them before they impact your users. The key advantage Catchpoint provides is the world’s largest, independent, and most reliable global observability network. With over 2500 nodes in 89 countries, this network ensures you can monitor from your users' point of view no matter where they are. Catchpoint is also cloud independent, while other solutions are hosted on the public cloud. If those clouds are down, then that solution is down. Catchpoint provides a more comprehensive, independent observability solution with far more options for interoperability, integration, and customizability. In short, Catchpoint makes it easy to find and fix issues fast. The result is a better digital experience for everyone, higher customer satisfaction, higher productivity and more revenue.

See Catchpoint in action or get in touch with us for details.

Summary

Or do they?

"Interestingly, the majority of outages detected by Metrist never appear on the third-party cloud vendor’s status page." - Jeff Martens, Metrist

Independent monitoring from the public cloud so you don't lose visibility when the cloud goes down
Monitoring where it matters so you see issues from your workers' perspective no matter where they are
Real-time data so your response isn't further delayed while you wait for data sources to refresh
Granular, non-aggregated data that can be retained for years, because the devil is usually in the details and you may need to compare archived data to determine trends or changes from original baselines
Alerts you can trust with stateless nodes that provide valid data and eliminate time-wasting false positives

Catchpoint can help you mitigate the cost of outages with all the capabilities described above.

See Catchpoint in action or get in touch with us for details.