Blog Post

The Hidden Danger of Websites Relying on Third Party Software: A Case Study with 5 Key Takeaways

Let's dive into the details of a recent incident related to utilizing third-party software, how advanced synthetic monitoring allowed us to detect the issue, and five key takeaways to consider.

Using third party software on websites comes with risk and reward. eCommerce sites and platforms typically rely on the integration of a significant number of third-party apps and tools to augment functionality and features, from extracting customer data for personalization to enabling live chat to analyzing user experience of changes to a site. While third parties are often invaluable for these kinds of interactive purposes, they can also be the cause of disruptions to user experience. Last week, one of our customers, a leading global sports apparel and equipment company, experienced just such an issue when their use of a commonly deployed A/B testing tool impacted performance and availability on several key domains for their West Coast users for nearly eight hours.  

In this blog, we’ll dive into the details of exactly what happened, how advanced synthetic monitoring (what we call Internet Synthetic Monitoring) allowed us to detect the issue and we’ll conclude with five key takeaways to consider when utilizing third party software on your websites.  

Let’s get started.  

A/B testing causes failures across multiple sites  

“We wanted to highlight the root cause of an issue that occurred yesterday, impacting several critical sites due to issues with A/B testing experiments.” So began my (Alexa’s) email to the Platform Engineering and SRE team responsible for maintaining global eCommerce functionality. As the company’s Enterprise Customer Success Manager, I wanted to ensure they had caught the issue and apprehended the third-party’s impact on their production sites. I also wanted to work together to set up additional synthetic tests to ensure they had far-reaching visibility over similar issues going forward.    

A/B testing and experimentation tools like Optimizely, Google Optimize and Adobe Target are rightly popular tools as part of a robust eCommerce optimization and personalization strategy. They usefully enable the comparison of user experience of multiple versions of the same variable with the goal of improving user engagement of features and products. Nonetheless, it’s important to be aware of the risks such third party tools can introduce to user experience while in use, specifically, their potential to become a SPOF (single point of failure).  

Graphical user interface, application
Tests indicating 8 failed requests (Catchpoint)
A picture containing calendar
Waterfall showing request timeouts for child requests served from the main company domain (Catchpoint)  

Catchpoint SPOF failure testing allowed us to pinpoint the root cause and determine its impact. As you can see above, use of the third-party optimization tool was creating multiple failures for the company’s domains, specifically for child requests. The basic structure of the page was working, but the next level of requests needed to complete the page structure, such as images and JavaScript, which make the page dynamic and interactive, were impacted and causing failures.  

Catchpoint tests started to fail from 01.39am PST, indicating severe performance challenges that created availability issues for West Coast users of the enterprise’s websites until 09:04am PST. Key functionality was impacted, including important features such as Search and the transaction cart. Commerce product walls were also affected. Catchpoint tests saw failure due to increased wait, connect and response time for requests served from the company’s main domain.  

Graphical user interface, application
Waterfall showing the render blocking/critical rendering path request failure (Catchpoint)  

As the dev and engineering teams tried out frontend and core backend changes using the site optimization tooling, a logic change was pushed out that created the render blocking failure, critically slowing down page load times until critical resources were simply not available.  

How did this impact actual end users?

The content on company pages, for instance product pictures, loaded extremely slowly until eventually the page timed out entirely. In the filmstrip below, you can see the webpage response time took over 30 seconds during the incident. Typically, the content on this page would load within 3.5 seconds.  

Graphical user interface, applicationDescription automatically generated
Filmstrip showing partial page load (Catchpoint)
Graphical user interface, applicationDescription automatically generated
Request data for the site optimization feature shows the change in file size impacts page availability (Catchpoint)

One of the product pages impacted was a popular feature to customize footware. Without access to images to see the impact of customization, the feature is essentially useless. Some might say this applies to all product images on an eCommerce site. If you can’t see them, you’re highly unlikely to buy them.  

Consider if the timing of this incident had occurred during a peak period such as a popular new product line being introduced or a sale underway. It would have had a substantial impact on revenue.  

The impact of Internet disruptions on revenue

A recently commissioned study from Forrester Consulting showed that 39% of eCommerce companies are losing between $500-999,999 monthly due to these kinds of Internet disruptions. Furthermore, Forrester found that the inability to identify Internet disruptions is threatening business. “Too often, companies rely on a process of elimination to locate disruptions”, the report reads. “Quickly identifying the source is critical, even if it is outside the company’s control. It allows the company to mitigate the disruption’s damage by communicating with customers or reaching out to third-party providers immediately. E-commerce companies must overcome these challenges, or they’ll continue losing millions of dollars per year.”  

Key takeaways for eCommerce sites when integrating with third party software

Compiled by Nilabh Mishra, Senior Manager, Value Engineering, Professional Services  

  1. When third party software is involved in making your site run (which in eCommerce is nearly always), you need to move away from just uptime synthetic monitoring. Lots of our competitors simply monitor whether your page is up or down, but that’s not the full story. As we can see in the case study above, for this multinational athletic footwear and apparel company, the page was up but the images weren’t viewable. The page appeared to be up: it was showing a 200, but in fact page load had gone into a loop since the product request it was trying to pull wasn’t available. A performance issue was creating an availability issue that basic synthetic monitoring would not have caught. You need to factor in the entire load process and only advanced synthetic monitoring will do this, fully revealing the serious performance and availability issues at stake here.
  1. Application Performance Monitoring (APM) and Internet Performance Monitoring (IPM) solutions need to work hand in hand. Just as basic uptime monitoring wouldn’t have caught the issue, nor can you rely on APM to monitor your eCommerce and retail sites. Again, in this instance, you would have seen a false positive and not the bigger picture.  
  1. Utilize Internet Performance Monitoring (IPM) when relying on third-party tooling. Why?  IPM is essential when monitoring third parties and ensuring minimal impact. It’s very hard to track an issue like this with simply an APM solution. With APM only, it’s significantly harder to see how a third-party is impacting issues like page load times. With an IPM solution like Catchpoint’s, it’s a lot simpler. Even if you were just running a single page test, we can pinpoint these kinds of issues. IPM can help you understand exactly what is failing and the impact it has. Then when your SREs go to your APM solution, it will be simpler to troubleshoot. They will have the right information in hand providing a clear understanding of precisely what’s broken, allowing for faster MTTR.
  1. Partner with your Professional Services team to ensure the best monitoring strategy is in place. We are currently working with one of our largest retail customers to monitor their website build IDs from both our synthetic and RUM products. Third parties will have those IDs as part of their request structure. We can capture those IDs for you, examine and correlate them, so that even if your page isn’t down, you’ll be able to understand exactly what kind of an impact they’re having.  
  1. Do a regular audit and clean-up of all third parties your website is using. We have seen customers keep pushing third parties into pages to gather more data or for other purposes, but not actively clean them up, so third party tooling is still being run months or even years after they’re no longer in use. Make sure you regularly check what is active and remove those integrations or plug-ins which aren’t to help reduce preventable issues from disrupting customer experience.  

Further reading

This is some text inside of a div block.

You might also like

Blog post

July 19th global IT outage reminds us of digital complexity

Blog post

A deep dive into global DNS connection performance with IBM & Catchpoint

Blog post

Key findings from The Internet Resilience Report 2024