Incident Review - Rolling Comcast Outage Disrupts Work from Home for Millions of Users Across the U.S.
Examining the impact of the Comcast outage on work from home users.
The rolling Comcast outage on Monday, November 8th and Tuesday, November 9th affected customers across the U.S., knocking users offline around the country. The first wave took place Monday evening in the San Francisco Bay area. The second, which had a wider geographic impact, occurred Tuesday morning, primarily affecting broad swathes of the Midwest, Southeast, and East Coast.
With more than 31 million broadband Internet subscribers, the outage of the country’s most popular cable network had a serious knock-on effect for remote workers. Millions were unable to go online and access business applications just as they normally would have been jumping into their morning calls.
Many took to Twitter (presumably via their cell networks) to voice their frustration.
Users across the country continued to experience sporadic outages the rest of the day.
In the past, an Xfinity outage wouldn’t, of course, have had much impact on the business world. However, with a sizeable proportion of Americans continuing to work from home, an outage that takes down Comcast’s network backbone on this kind of scale can quickly take on Priority 1 (P1) impact, and will have severely disrupted business continuity for hundreds of businesses around the country.
Catchpoint detected the Comcast outage in real-time
Catchpoint’s observability network, widely distributed across the country, detected the severity of the outage’s impact on availability and reliability in real-time. Through our varied observer types, we were uniquely placed to address the issue using a multi-point triangulation: via backbone, last mile and endpoint observers.
Businesses that have integrated Catchpoint at the heart of their Service Desk strategy were able to alert their remote workers to the problem, quickly advise them of failover options (whether connecting to a different ISP, 4G router or other alternative), and immediately alert them when service was restored so business could return to normal as quickly as possible**, therefore avoiding two days of business disruption.**
Catchpoint’s Employee Experience solution shows the impact on work from home users
Catchpoint’s Endpoint observers, which we use to ensure quality of service for our own distributed workforce, detected issues across multiple of our employee’s endpoints.
Let’s look at three employees, distributed around the country, all Xfinity customers, who each had challenges caused by the outage: Amy T. (MA), Jeena J. (CA), and Greg C. (NY). Each one experienced – with different degrees of severity - performance issues with Microsoft Teams, a crucial collaboration tool, that as for so many, we rely on daily for smooth business operations.
As you can see from the following dashboard, we saw 1% Packet Loss on average - with Amy seeing a 65% packet loss at its peak - towards MSFT Teams on Tuesday.
In terms of being able to reach MSFT Teams, our employees’ experiences were varied. As we can see from the screenshot below, video streaming quality was affected: Jeena's hop count changed, Greg was impacted by high RTT, and Amy experienced a high level of end-to-end packet loss.
Another employee reported his Zoom call being interrupted during the outage. Imagine the impact if he had been in the middle of presenting to a customer or something equally important!
Want to look inside the black box?
Catchpoint has a unique advantage as an observability solution, compared to other endpoint monitoring vendors, which solely offer device metrics, Catchpoint is far better placed to detect the type of issue work from home users experienced during the Comcast outage.
Point solutions like Device Performance Monitoring, Network Performance Monitoring, Application Performance Monitoring, and IT Infrastructure Monitoring would not help identify the issue. Device Performance Monitoring might be able to show a drop in the usage of certain applications, but the cause would be totally invisible. What typically happens when IT is alerted of a business impacting issue but cannot isolate the root cause because it is outside of their infrastructure? Costly war rooms are created involving multiple IT groups, and finger pointing begins.
Catchpoint, however, provides insight into the black box of the modern digital enterprise. This allows us to point out exactly where an outage such as this one is happening (our backbone observers first detected the problem), allowing us to alert the IT team in near real-time. Meaning, no need for a P1: you can proactively alert the impacted users and work with the relationship manager for Xfinity to stay updated, fix the issue, and ask for service credits.
For further information on major incidents in 2021, please check out our new report. You’ll find detailed analysis, as well as a checklist of best practices to prevent, prepare for, and respond to an outage.
Download “Ensuring Cloud Reliability in 2022: What the outages of 2021 taught us".