App rage is real. It’s what we feel when a cloud service we depend on is slow or unavailable, as SFDC was yesterday morning. It’s also what a SaaS provider feels when customers vent their frustrations during an outage. Catchpoint has been on both ends of app rage, and as we monitored Salesforce.com’s performance yesterday we saw a number of lessons that vendors and clients can take away from outages like this one.
As a provider, these three lessons jumped out at us as we drilled into SFDC performance to find out what was causing the slowdown:
1. Your service will fail. It’s Murphy’s Law: Whatever can go wrong will. And whenever multiple things can go wrong at once, they will. Every provider knows this all too well.
2. Proactive, hypervigilant monitoring is the best defense against major performance issues. Catching problems at their earliest stages is essential to limit their impact on your customers and your business.
3. You need a plan for when the inevitable happens, including how you’ll support your customers – and their customers – when your service is down, and how you’ll communicate while you’re troubleshooting an issue so that responding to customers doesn’t compound the problem or distract your teams. Use SFDC’s response as guide to how to talk openly and transparently about an issue: trust.salesforce.com kept clients around the world continually updated as the company diagnosed and solved yesterday’s problem.
As a client, there’s an important lesson in patience at the top of the list:
1. Your vendor is working hard to address the issue, and calling to vent your app rage is not going to help find a resolution any faster. Trust the providers that support you to fix the problem as fast as possible…
2. …but also establish SLAs with providers of business-critical services, and make sure you can monitor and enforce them. (See this post for an interesting example of SLA monitoring.)
3. And just as your provider should, make sure that you have a Plan B in place and ready to roll out when a critical cloud service goes down – because sooner or later it will. Whether that means failing over to a backup system or writing orders in pencil, know what to communicate to your internal and external customers before an outage happens.