The Challenge Of SLA Management
SLAs can be complicated to manage. They may even lead to legal disputes between the vendor and the customer.
Typical reasons for SLA disputes can include:
- Discrepancies over the time-frame being addressed.
- Bias sources of the performance metrics.
- Unknown accuracy of the data.
However, there are several actions that both vendors and customers can take to ensure accurate and verifiable performance data, as it pertains to their SLAs.
1. Determine SLA Parameters
The first and most critical step for reducing SLA disputes is to define the parameters for SLA data collection and tracking. This typically includes both parties agreeing on two major components:
- The data collection method (often a neutral third-party)
- The time and locations for conducting performance analysis
Select a Data Collection Method
Agreeing on a data collection method is critical. If vendors and customers are using different monitoring tools to measure Service Level Indicators (SLIs), there will inevitably be disagreements on the validity of the data and whether the Service Level Objective (SLO) was reached or not.
Selecting the appropriate neutral third-party depends on a few factors, such as the number of users being served and where those users are located. An international business must also look for a solution that has ample coverage, or else they risk 1) SLI blindspots and 2) relying on multiple different SLA monitoring systems (and data sets) with varying collection methods.
Flashtalking, an international online advertising company, relies on Catchpoint for unbiased SLA monitoring. This is because Catchpoint’s Global Node Network spans more than 850 nodes across 80 countries — meaning diverse performance data for accurate and objective accountability.
Define Time And Location Requirements
Measurement locations from the backbone and the last mile, as well as any relevant cloud provider, should be included for a full view. Requiring a diversity of analysis locations also ensures both parties can retain visibility and reporting capabilities should the cloud provider itself experience an outage. For example:
- Use backbone testing to eliminate noise and validate Service Level Objectives (SLOs).
- Use last ile testing to replicate the end-user experience.
2. Set Up Alerts Around SLA Thresholds
If you’ve followed the advice above, you’ve completed step one of effective SLA management: establishing the criteria for objective performance analysis. Creating alerts to catch degrading service, breach thresholds, and disruptions is step two.
The Cloud Ops team at iSeatz, a loyalty tech company enabling travel & lifestyle bookings, relies on CatchPoint’s alerts to monitor the performance of their CDNs. “Every region uses a different edge making it complicated to test,” says Warshawski, VP of Cloud Operations and Information Security at iSeatz. “By setting up Catchpoint nodes in different locations where our customers exist, we can evaluate performance and take action such as spin up a new environment.”
Having fast and reliable alerting is essential to quick remediation and transparent communication, regardless of which side of the SLA agreement you’re holding.