Subscribe to our
weekly update
Sign up to receive our latest news via a mobile-friendly weekly email
On December 8, 2023, Adobe's extensive customer base was impacted by a series of outages in the Adobe Experience Cloud lasting approximately 18 hrs.
On December 8, 2023, Adobe's extensive customer base was impacted by a series of outages in the Adobe Experience Cloud, starting from 8:00 AM EST and continuing until 1:45 AM EST on December 9.
We haven't seen a third-party outage of this magnitude since the DoubleClick outage of 2018.
According to Adobe, Data Collection (Segment Publishing), Data Processing (Cross-Device Analytics, Analytics Data Processing), and Reporting Applications (Analysis Workspace, Legacy Report Builder, Data Connectors, Data Feeds, Data Warehouse, Web Services API) were all affected by the outage.
Adobe Analytics reports that within the Experience Cloud, multiple services were down for several hours. The outages in various services started and ended at different times, with varying outage durations. Note that these times do not reflect when Adobe updated its status page or informed its customers about the outage.
The cost of such a service being down for 18 hours adds up quickly – for Adobe and its customers. Both are impacted by lost revenue due to service disruptions and damaged brand reputation. On top of that, Adobe risks incurring SLA violations for millions of customers.
Catchpoint's Internet Sonar was the first and only tool to detect this outage, significantly outperforming others like Thousand Eyes and Downdetector. This incident not only validates our claims about Internet Sonar but also underscores the importance of Catchpoint’s Internet Performance Monitoring (IPM) Platform to navigate the growing complexity and fragility of the Internet.
Now that we understand the 'what' and the 'how much,' let's dive into the incident review.
Catchpoint’s IPM Platform spotted the outage in at least three different ways:
Here are our observations for each of the three areas above.
Internet Sonar monitors Adobe Tag Manager, a service in the Adobe Experience Cloud. On Friday, December 08, at 8.03 AM EST, Internet Sonar detected timeout errors from a large number of cities globally. Internet Sonar alerts notified customers at 8:20 AM EST once the failures were confirmed as significant incidents, not just short-term outliers.
Internet Sonar quickly detected outages to Adobe Tag Manager as shown above. Sites with this tag were painfully slow, with some taking 100-200 seconds to load.
Internet Sonar also performed intelligent correlation for failing Synthetic tests run by customers. The screenshot below shows the records page of a Catchpoint Synthetic test for a customer's service using Adobe Tag Manager, where Internet Sonar correlated the test failure with the Adobe Tag Manager outage.
Internet Sonar enables users to answer the question, "Is my service experiencing issues due to a problem in my application or infrastructure, or is it one of the 3rd party services in the Internet Stack I rely upon to deliver my service?"
Many of Catchpoint's e-commerce customers who rely on Adobe also began experiencing multiple failures in the synthetic tests they run on the Catchpoint platform.
Root cause for the failures for Journey Optimizer: Request
https://auth.services.adobe.com/signin
returning an HTTP 404, resulting in login not going through.
An e-commerce customer dependent on Adobe extensively monitors the Adobe Experience platform. Their synthetic test for "Adobe Journey Optimizer," part of Experience Cloud, showed significant impact:
HTTP Response: {"errorCode":"invalid_resource_id","errorMessage":"Could not find resource for id v:2,s,f,bg:eclogin,..."}
TCP Timeout errors for Launch.js JavaScript Request; URL initiated by DTM.js
HTTP Request: https://assets.adobedtm.com/a7d65461e54e/6e9802a06173/launch-43baf8381f4b.min.js
Incident Start time: Dec 08, 2023 - 05:04:37 PT
Status: Ongoing
Regions Impacted: Global
Catchpoint Professional Services, which monitors and analyzes websites for major retail and e-commerce customers, noticed several failures attributed to the Adobe outage.
We observed failures across multiple tests due to test timeout impacted by high connect time for assests.adobedtm.com
We also observed no response from servers:
We noticed the host chart showing an increase in wait time for requests from "assets.adobedtm.com"
Waterfall data also showed 503 – Service Unavailable error for a specific request from Adobe:
We also noticed test failures and performance degradation due to Adobe request failures.
We also used WebPageTest (WPT) results. Note that only after the timeout of Adobe assets, the content on the page is displayed to users.
Real user monitoring (RUM) data revealed the impact on end users:
Imagine waiting to find out your service or site is down through negative posts on social media. Now, you don't have to.
When it comes to outages like these, it is extremely important to have a tool that helps you answer the question, "Is it me or something else?" A tool capable of pinpointing the source of Internet disruptions at a glance, meaning no finger-point, no war rooms, just intelligent, trustworthy Internet health information to accelerate incident detection. That's the core concept behind Internet Sonar.
As Steve McGhee, Reliability Advocate, SRE, Google Cloud, highlighted in his Conclusion for Catchpoint's 2023 SRE Report, there is a reason why experts never depend on a single solution, tool, or platform to accomplish their tasks in the best possible manner. "When it comes to skilled labor, or 'operations' perhaps," writes Steve, "you want teams to be able to reach for the right tool at the right time, not to be impeded by earlier decisions about what they think they might need in the future."
Watch this on-demand product demo video to learn more about Internet Sonar and look out for an upcoming blog post discussing best practices for monitoring the Adobe Product suite with Catchpoint.