In my first blog about my learnings from the Gartner ITOSS conference in Orlando, I shared my take on how cloud, the API economy, and IoT are driving significant changes in monitoring requirements. For infrastructure and operations (I&O), this means augmenting your tooling strategy from looking inward to a customer or user in approach to monitoring your services to build telemetry on external elements that can severely degrade your services like APIs, network protocols, CDNs, SaaS providers, and even DNS.
While Gartner predicts that by 2020 more than 50% of enterprises will replace core IT operations management tools entirely, just retooling by itself is not the answer. In the presentations, “The Cloud Computing Scenario – The Last and the Next 10 Years” and “Digital Business Platforms: The I&O Perspective,” Gartner’s Milind Govenkar and Hank Marquis admonished I&O leaders to “focus on outcomes.” To change from “an inward, building, and running infrastructure to focusing on facilitating transactions,” spending the bulk of time and resources on delivering successful customer (or in the case of internal services, “end user”) experiences. This suggests shifting to a higher I&O maturity level regarding capabilities and metrics.
Newer Monitor Tools Require Newer Organization Capabilities
First, I&O leaders need to change monitoring telemetry and processes to managing of third-party service providers such as SaaS, DNS, et al. services. Or as Miland says, “become the broker and spend more time on governance [of cloud-based services/providers].” Unlike internally-developed apps, the source code is controlled by the SaaS provider, rendering traditional code-level monitoring tools ineffectual. But, having the right tooling to monitor third-party providers is only the first step; I&O leaders need to build the skills and processes to fully leverage new tooling to solve these newer monitoring challenges including using the monitoring insights to understand how external providers are impacting your business and holding them accountable when they breach your service level agreements (SLAs). SLAs are virtually useless there is the capability (tools/skills/processes) to monitor and govern them. In Mehdi Daoudi’s A Practical Guide to SLAs blog, he urges I&O leaders to focus on four key pillars to building an effective SLA practice:
Similarly, when it to comes monitoring APIs I&O leaders need to build the skills and processes to fully leverage new telemetry to drill down into such questions as:
- Is the problem with your web page?
- Is the problem with your web service or a third-party web service?
Regarding network components, as network infrastructures get more complex with multiple cloud vendors responsible for delivering services, the risks to businesses increases. I&O leaders should build the know-how in their organizations to sense network elements such as BGP, TCP, DNS, and NTP responding within appropriate parameters and be able to quickly drill down to identify cause(s).
- How long does it take to reach a location?
- What is the path to reach a location?
Focus on Outcomes
Now back to the central tenet of my learning, which is that I&O leaders need to leverage their newer monitoring capabilities (newer tools, skills, processes) and embed them to business-critical metrics. Miland said, “The only right service is the one with business value.” To me, this means two inextricably but often convoluted things. One is focusing on the dev and ops lifecycle with metrics like reduced mean time to detect (MTTD), improved service performance (speed, availability, reliability), reduced “noise” (i.e., false negatives, war room calls), and reduced dev and ops workload (due to continuous dev testing, proactive ops monitoring). In the Cisco Systems Panel Discussion “Managing Business and Application Performance in Digital Enterprises” (moderated by former Gartner analyst Jonah Kowall), United Airlines’ Director of Application Performance Management Bill Hineline exhorted that they experienced a “40% drop in diagnosis time” pre-APM by eliminating “server hokey pokey” et al. practices through using APM in the application lifecycle (QA, load test, etc. ) of their apps.
But given the massive shift to cloud-only era (and throw in some API economy and IOT “lighter fluid”) where both digital innovation AND what I call digital renovation (where core services like Office, CRM, etc. are rapidly migrating/moving to the cloud), I&O leaders need to instrument tooling and change capabilities to increasingly connect dev and ops metrics to business value. Both Hank and Miland said to “change focus from (only) inward to customer experience success.” There are many high profile public examples of the negative business impact of poor online service such as the AWS S3 outage, which one source estimated it cost S&P 500 companies $150 million, and U.S. financial-service companies $160 million in lost revenue. And Macy’s intermittent Black Friday and Cyber Monday “mini-outages” clearly showed the business impact goes beyond revenue to negative brand sentiment. Also consider the impact of poor internal service delivery of business-critical services like CRM, web conferencing, call centers, collaboration, and the list goes on.
According to a study conducted by Visions Solutions, there are two types of costs associated with downtime and performance issues: direct and indirect. The direct cost category includes lost transaction revenue, wages, and inventory, as well as legal penalties obtained from an SLA breach. The loss of customer goodwill and business to competitors are considered indirect costs. All of these costs add up to actual dollar amounts that easily reach into the millions. I&O leaders need to instrument monitoring tools to provide the capabilities to correlate customer or end user experience directly to business metrics such as conversions, customer retention, cost, and revenue.
I learned a lot in my two days at Gartner’s ITOSS conference. The I&O industry is in a period of rapid change and disruption, and for businesses to survive and prosper in the digital era, I&O leaders need to continuously innovate and learn around cloud, API economy, new tooling and monitoring capabilities, and an obsessive focus on business outcomes. Miland admonished us all with William Pollard’s enduring quote, “The arrogance of success is to think that what you did yesterday will be sufficient for tomorrow.”
On behalf of Mehdi Daoudi and myself, I’d like to thank all the Gartner analysts and support staff for delivering an amazing conference experience, and we look forward to continuing our relationship.