Website uptime monitoring
Website uptime monitoring refers to monitoring the performance of a site or application’s uptime.
Website uptime, or uptime, is a metric used in performance monitoring to measure the amount of time that a site is “up,” “live,” or“ available” — essentially, the amount of time a website is working and accessible to users.
Website uptime and website availability are often used interchangeably. But, uptime is different from website availability as uptime is the amount of time a system is operational, while website availability is the percentage of time that the system is operational.
If a website isn’t up, it’s down—there’s an outage or other issues and customers can’t access a business’ website, application, or online store. So, it’s a business-critical metric as it directly impacts a business’ productivity and profit.
Why website uptime monitoring matters to businesses
If a website isn’t live, then customers can’t visit it. It’s easy to understand how important that is to an ecommerce site, like Amazon. But, downtime is equally detrimental to the revenue of other companies too.
Software as a Service (SaaS)
SaaS companies for example, need their applications to stay live, otherwise, they risk breaching their service level agreements. When a service level agreement is broken, the company in breach must pay a fine. In the 2017 State of SaaS report, these fines typically exceeded $350k.
Ecommerce stores are essentially “closed” when they’re down. So, it’s vital that ecommerce businesses focus on as much uptime as possible.
There are huge travel sites these days, like Priceline, Booking.com, and Expedia that rely directly on uptime for their revenue streams. They’re not only relying on the uptime of their own site, but the uptime of the other sites which they compile the best deals from.
What to monitor to improve uptime
It takes more than just the home page of a website to keep the site live or “up” and working. The Internet of today is complex—the cloud and other complex structures and components are involved in the performance of sites and apps. Each component as critical to uptime as the next.
A Domain Name System (DNS) turns a website URL, like example.com into a numeric address that computers and servers use to communicate. DNS is the first step in the journey of information from one machine to another, it’s literally the moment the user types a domain into the browser.
Monitoring DNS should be a priority in improving uptime as without functioning DNS, users can’t reach a site or application at all. Imagine that they need to physically travel to this business, but they can’t make it further than their driveway because they don’t have directions on how to get there.
Content Delivery Networks (CDN) deliver information to a user’s computer or phone. Because CDNs deliver content, the proximity of a CDN to a user impacts download speed.
CDNs stage content closer to end users which reduces latency and improves performance. The closer the content is, the faster it’s downloaded. A multi-CDN strategy provides coverage in multiple geographies for improved speed and performance.
Proximity to users is important, but the chief benefit of multiple CDNs is having a failover in case there’s an outage.
Monitoring CDNs is vital to uptime. With multiple CDNs in place in various locations, I.T. teams must know which location is experiencing downtime. This time the user was given directions to a store that’s 30 miles away when there’s a store only 5 miles away.
The user would usually want to go to the closer store, but in case there’s a road closed, they can trek out to the further store for what they need.
Servers store all the information on a website. If many servers go down, then the website CDNs can’t deliver any information. And so, the site will likely go down.
I.T. teams need to monitor their servers so that they can switch over to backups should there be an outage. Again, imagine the user, physically journeying to the business location, but now instead of being stuck in their driveway, they’re stuck in the business’ parking lot, but the company’s building vanished.
Third parties and cloud providers
Third parties are the components of a site or application that aren’t owned by that company. These vendors can be an analytics tool, a marketing platform that integrates with a website or application, or any other software that isn’t owned by the company.
Third parties hosted via cloud are usually SaaS (software as a service), PaaS (platform as a service), or IaaS (infrastructure as a service).
• Uptime can depend on SaaS components, or software hosted via cloud, like the analytics tools mentioned above or display ad software.
• PaaS companies host hardware and software via cloud that developers can use to develop and deploy code. If a PaaS goes down, then certain elements of code may not be deployed to the website or other components of the website or application’s infrastructure.
• IaaS companies provide virtualized infrastructure components via cloud. If an IaaS tool goes down, for example Amazon AWS, an entire site can go down if it’s depending on the cloud and the many intertwined pieces of AWS.
Other uptime metrics to monitor
Downtime isn’t the only metric that negatively impacts end users. Modern sites and applications contain multiple functioning parts, like third-party credit card processors, analytics tools, microservices, and more.
Today’s users often feel that if a site is slow, it may as well be down. In response to that, some companies are putting latency into their service level agreements (SLAs).
Just because a site is up doesn’t mean it works. For example, your site might work perfectly until someone tries to access it via mobile. Since the site isn’t mobile responsive, it’s impossible for users to add items to their shopping carts. Is that considered down? Or what if the “add to card” button isn’t working on a site. If users can’t make a purchase, is the online store down?
Sites and applications require routine maintenance. Some companies consider this routine maintenance downtime to be actual downtime—others omit it from the metric objectives in their service agreements as it’s part of keeping an application working properly.
The basics of a website uptime monitoring strategy
If a business can get ahead of a potential issue, or catch an issue early on, then they can prevent issues from affecting users, or at least from affecting many users. Uptime monitoring is essential to improve user experience and to ensure third parties meet their SLAs.
An uptime monitoring strategy must include synthetic monitoring, real user monitoring, and a plan for managing service level agreements.
Synthetic monitoring utilizes agents or nodes that mimic user behavior in tests. These tests are run 24/7 to detect downtime, outage, latency, and other important metrics.
Since the tests run 24/7, they help companies get ahead of potential threats to user experience. I.T. teams can set alerts to be notified when important thresholds are passed—for example, if many users are experiencing latency greater than 5 seconds. Or, if downtime lasts more than X seconds.
Another example would be that if a server is experiencing high response times, a business would want to pinpoint which server it is. While RUM points to the high response time, synthetic can dig deeper and pinpoint the exact server responsible for the delays. Finding the specific server reduces downtime and ensures availability.
Real User Monitoring (RUM)
Real user monitoring, also called passive monitoring, collects data from real users of an application or website via performance monitoring software. RUM allows a business to preempt uptime issues early on when they detect that a portion of users are experiencing problems.
SaaS, PaaS, and IaaS companies have legal contracts with their customers. These contracts, called service level agreements (SLAs) outline the level of performance the application must meet.
If the promised levels of performance are not met, the SaaS company is in breach of its contract. A breach results in fines paid to the customer.
These uptime guarantees must be met, otherwise businesses typically pay fines that often exceed $350k. Companies need to utilize both synthetic and real user monitoring to ensure they’re meeting their SLAs, particularly the agreed-upon uptime percentage.
Also, companies should monitor their third-party applications so that they can keep them accountable to their SLAs. For example, if a business application relies on AWS, then it’s important the business monitor AWS to both determine if AWS is a problem source and to ensure AWS meets their SLA.
Website uptime monitoring is a vital piece to a company’s monitoring strategy—for both SaaS companies and any business with a web presence. Companies must monitor, with both synthetic and RUM, to get the most accurate view of user experience. Only an accurate view will allow companies to get ahead of issues and fix them faster.