Learn

The Internet Resilience Report 2025

Introduction

If the Internet was already teetering on a fragile edge when we launched our inaugural Internet Resilience Report in June 2024, it’s even more so now. Just consider the CrowdStrike incident—almost a year later, some organizations are yet to fully recover.

But it’s not just outages causing chaos. The industry is waking up to a new reality: slow is the new down. Sluggish websites and applications don’t just frustrate users, they drain revenue and damage reputations.  

This escalating urgency is why Catchpoint has researched and authored the second edition of The Internet Resilience Report. Building on last year’s findings, it dives deeper into the critical dimensions of Internet Resilience. From AI’s growing role in mitigating disruptions to the undeniable importance of fast-performing websites and applications, the report offers a roadmap for navigating today’s digital minefield.

In an era where downtime costs millions monthly and slow performance can sink even the most established brands, resilience is no longer optional. It’s a must-have for survival in our interconnected world. And at the center of it all is the Internet Stack—the foundation upon which every digital interaction depends.

Mehdi Daoudi
CEO, Catchpoint

What's the Internet Stack, and why does it matter?

The Internet Stack is the collection of technologies, systems and services that make possible and impact every digital user experience – from foundational protocols like DNS and BGP to third-party APIs and CDN providers.

To ensure resilience across the Internet Stack, organizations must actively safeguard and maintain four core dimensions:

  1. Reachability - Can users access from where they are?
  2. Availability - Is it functioning as expected?
  3. Reliability - Will it work consistently, every time? 
  4. Performance - Is it fast enough?

All four are critical, but the emphasis on performance in this year’s report reflects a seismic shift in how organizations perceive resilience. It’s no longer sufficient for websites and applications to merely be "up" – they must also deliver fast, seamless experiences. 42% said if their websites or apps are slow, they might as well be down. A recent Forrester study of online retailers reached a similar conclusion, highlighting how widespread the “slow is the new down” mindset is.

Key findings

Internet Resilience: The capacity to ensure availability, performance, reachability, and reliability of the Internet Stack despite adverse conditions

Slow apps are dead apps

73% declaring fast, high performing websites are critical to business success – with 42% claiming if apps are slow, then they might as well be down.

Key Finding 1 icon

Best-of-breed or bust

73% using Internet Performance Monitoring (IPM) to “ensure excellent customer and employee digital experiences” — preferring best-of-breed solutions versus broad, non-best-of-breed solutions.

Key Finding 2 icon

AI doesn’t fail quietly

57% realizing immediately when the AI supporting their critical tier 1 apps goes down or becomes slow – with the purchase or use of third-party AI capabilities being the predominant approach.

Key Finding 3 icon

Financial fallout of non-resilience

51% realizing $1M, or more, in negative economic impact from monthly incidents – up from 43% in 2024.

Key Finding 4 icon

Why? Why now? Why Internet Resilience?

The findings show resilience is no longer just about uptime. It’s about protecting people, revenue, and performance.

Anchor resilience in people

Operational efficiency keeps the lights on – customer experience keeps the business growing

What are the highest drivers for organizations' need to make digital experiences be resilient?

  • Operational Efficiency73%
  • Customer Experience69%
  • Workforce productivity66%
  • Revenue growth/preservation63%
  • Risk Mitigation62%
  • Governance/regulatory compliance51%

Resilience goals without purpose is no resilience at all. The drive for efficient business operations (73%) as a horizontal foundation to improve customer experience, workforce productivity, and grow revenue—like a rising tide that lifts all boats—improves the likelihood of achieving every other goal.

The real story here is that customer experience and workforce productivity showcase how vital it is to deliver frictionless service to people. Sure, resilient Ops matter; they keep systems humming. But when you anchor resilience in people, you’re not only preventing downtime—you’re also making sure every interaction is fast, seamless, and engaging. That’s what actually cements loyalty and drives long-term growth.

The real cost of inaction

A million-dollar argument for investing in resilience

How much total economic loss did internet outages or disruptions over the last 30 days – including those on the Internet Stack – cost your business?

  • $1M or less49%
  • $1M or more51%
  • Of the 51% who lost $1M or more:
  • $1M – $5M24%
  • $5M – $10M14%
  • More than $10M13%

The negative economic impact of incidents is too significant to ignore. 51% of organizations said they felt negative economic impact of over $1M, up from 43% a year ago. One way to justify the cost of investing in resilience is to acknowledge the hard truth this money is already being ‘spent’ when there are incidents. Therefore, organizations should ensure their Internet Stack is resilient to mitigate the realization of this impact.

It's not a matter of if incidents will occur, but when. These figures highlight the urgent need for robust resilience measures. By prioritizing a resilient Internet Stack, businesses can mitigate risks, minimize downtime, and protect their financial health, ensuring continuity and stability in an increasingly digital world.

Performance: The new rule of digital resilience

Uptime alone doesn’t cut it anymore

When it comes to your organization, which of the following are true?

  • Fast websites and apps are critical to business success 73%
  • Performance should be monitored just as availability/uptime is monitored 69%
  • Fast websites and apps are a competitive differentiator for our organization 57%
  • If websites or apps are slow, then they might as well be down 42%

73%

of businesses say fast websites and apps are critical to success.

42%

say that if they’re slow, they might as well be down.

These findings underscore the dire need for organizations to prioritize website performance as part of their internet resilience strategy. At a minimum, organization’s digital properties should be as fast as – or faster than - their competitors. Slow-loading websites can lead to frustrated users, lost sales, and a tarnished reputation.

To stay ahead, businesses must ensure their websites are optimized for speed and reliability across an ever-expanding edge. This involves regular monitoring where the user experience occurs, the establishment of experience level objectives (XLOs), and a clear understanding that fast digital web performance does not stop at the source where your sites and apps are hosted. Making website performance a cornerstone of resilience strategy ultimately safeguards operations, enhances user satisfaction, and drives growth.

Mapping the Internet Resilience journey

Confidence is uneven. Priorities are shifting. The journey to Internet Resilience is far from complete.

Benchmark your resilience stack

Not all parts of your digital stack are built to bounce back

Which of the following are highly resilient?

  • Your workforce's productivity stack (e.g., Microsoft 365)63%
  • Your cloud infrastructure stack(s)59%
  • Your organization's Internet Stack(s)57%
  • Your business-critical APIs53%
  • Your application stack(s)52%
  • Your customer's Internet Stack(s)51%
  • Your on-premise infrastructure stack(s)47%
  • Your third-party applications43%

To chart a path to resilience, you must first know where you are. Organizations should use this data to benchmark their own levels of Internet Resilience to identify strengths and improvement areas.

By comparing their resilience against industry standards, they can ensure their digital infrastructure is robust and capable of handling disruptions. Benchmarking helps prioritize investments in critical areas like productivity stacks, cloud infrastructure, and APIs, enhancing overall operational efficiency and customer satisfaction. Additionally, it provides insights into how well they are prepared compared to competitors, enabling strategic planning and risk mitigation. Ultimately, leveraging this data fosters a proactive approach to maintaining a resilient and secure digital environment.

What to prioritize over the next 18 months

AI gets the buzz, but XLOs are the backbone

What should your organization prioritize over the next 18 months to ensure your critical applications are reliable and resilient?

  • Invest in Artificial Intelligence (AI) capabilities or toolchains63%
  • Establish experience level objectives (XLOs) and service level objectives (SLOs)52%
  • Establish a culture of reliability and resilience51%
  • Upskill or train our workforce42%
  • Regularly test our resilience preparedness through e.g., simulated disruptions34%
  • Create or optimize business processes33%
  • Invest in other (non-AI) technology capabilities or toolchains29%
  • Create or move budget to secure necessary funds20%

With 63% of businesses prioritizing AI investment, hype around AI is reinforced (and for good reason). Given this hype, though, establishing experience level objectives (XLOs) should be considered the top business priority over the next 18 months (because who wouldn’t say ‘invest in AI’).

While investing in AI capabilities is crucial, XLOs are what truly matter for ensuring business resilience and success. XLOs provide a clear framework for measuring and improving user experiences, which directly impact customer satisfaction and loyalty. By augmenting recovery time objectives (RTOs) and recovery point objectives (RPOs) with XLOs, businesses can better manage disruptions and maintain high service standards. In a world where performance is the new availability, XLOs are how resilience is measured from the customer’s perspective. Don't let the AI hype overshadow the importance of experience level objectives—make them your number one priority.

Who owns resilience?

Everyone agrees resilience matters. Fewer agree who owns it.

Who should be ultimately responsible for digital or internet resilience in your organization?

  • Chief Information or Technology Officer 63%
  • IT Ops or Site Reliability Engineers (SREs) 44%
  • Chief Digital Officer 28%
  • Chief Resilience Officer 27%
  • Chief Security Officer 26%
  • Respective line of business owners 17%

The desire for a resilient Internet Stack to deliver seamless digital experiences can catalyze IT to business conversations. It is crucial for aligning IT and business on common goals, ensuring seamless operations and minimizing disruptions.

Without alignment, resilience efforts are likely to fail. Most organizations believe the ultimate responsibility for internet resilience should be part of the technology charter – with 44% saying IT Ops or SREs, and 72% saying Chief Information or Technology Officer – but the diversity of responses highlights the need for a unified approach. When IT and business leaders collaborate, they can create a robust strategy that supports both technological stability and business continuity, driving overall success.

The visibility gap: How organizations are reshaping resilience

Blind spots create the biggest risks. The results show a strong move toward targeted, best-of-breed monitoring—especially for third parties and the systems users actually experience.

Use Internet Performance Monitoring for a resilient Internet Stack

Square pegs can’t fit in round holes.

What monitoring tools does your organization use to monitor the Internet Stack?

  • Internet Performance Monitoring (IPM) 73%
  • Application Performance Monitoring (APM) 64%
  • Network Performance Monitoring (NPM) 63%
  • Digital Experience Monitoring (DEM) 55%
  • Other types 32%

A resilient Internet Stack relies on a robust observability framework, which is essential for detecting and resolving issues before users notice a disruption. Using purpose-built, best-of-breed Internet Performance Monitoring (IPM) tools is crucial; otherwise, it's like trying to fit a square peg into a round hole.

Non-purpose-built tools can lead to gaps in monitoring and missed critical alerts. 73% of organizations use IPM tools, highlighting their importance. Other tools like Digital Experience Monitoring (55%), Network Performance Monitoring (63%), and Application Performance Monitoring (64%) also play vital roles in a broader sense, but provide insights into different stack components. As we’ll see in the next data, best-of-breed is preferred versus broad [non best-of-breed] capabilities by more than double, so use IPM for visibility into the Internet Stack.

Practical advice for achieving resilience and balancing cost

Targeting best-of-breed for critical apps and digital resilience

How should organizations balance the need for Internet Stack resilience versus the pressure to reduce IT costs?

  • Fully leverage cloud benefits (e.g., scalability, elasticity, or flexibility capabilities) 65%
  • Prioritize ensuring excellent customer and employee digital experiences 61%
  • Prioritize resilience for your critical Tier 1 apps 55%
  • Use best of breed solutions purpose built to prevent costly outages in the first place 47%
  • Adopt a phased approach to resilience improvements 44%
  • Use an industry-recognized solution with broad capabilities even if not best of breed in all areas 23%

Fully leveraging cloud benefits (65%) in an internet-centric fabric is table stakes. Ensuring excellent digital experiences (61%) and prioritizing the resilience of your critical tier 1 applications (55%) is the game. There is even a case to be made for the resilience of your non-critical applications.

For example, regularly testing your resilience preparedness on non-critical applications may provide more conducive learning to ensuring your critcal applications always adhere to recovery and resilience objectives – table top exercises to prepare for live, production, critical systems. Ensuring excellent digital experiences can reduce costs by helping you calibrate internal monitoring. By focusing on what impacts user experience, you can streamline internet monitoring efforts, reducing unnecessary expenses. This targeted approach ensures efficient resource allocation, optimizing performance while minimizing costs associated with broad, non-specific monitoring.

Why monitoring third parties is crucial

Resilience breaks where visibility ends.

How critical are third-party platform technology providers to your digital or internet resilience success?

  • Extreme criticality 30%
  • High criticality 43%
  • Moderate criticality 25%
  • Small criticality 1%
  • No criticality 1%

No study of Internet Resilience is complete without considering third-party dependencies, which are critical to digital success. These dependencies must be monitored for service level adherence.

Agent-based Application Performance Monitoring (APM) cannot monitor third parties, but IPM can. Even though third parties monitor their respective services, organizations need to monitor themselves to proactively ensure reliability and performance since providers may not be forthright with service level-related incidents. Organizations (74%) consider third-party providers highly or extremely critical to their resilience success. This underscores the importance of comprehensive monitoring to maintain service quality and prevent disruptions caused by external dependencies.

AI and the future of Internet Resilience

Organizations are leaning hard on AI. The question is whether their resilience strategies can keep up.

AI outages don’t – and can’t - go undetected

AI can’t fail quietly—and yet, in many organizations, it still does

How soon is the impact recognized when AI that supports your critical Tier 1 apps become unavailable or slower?

  • Immediately 57%
  • When monitoring sends an alert 36%
  • When a user or employee complains 27%
  • Not until when the AI is next accessed 16%

We now live and work in an “AI or die” environment. This is because AI is essential for business success, ensuring the smooth operation of critical applications. When these applications experience downtime or slow performance, it can disrupt and damage business operations, leading to financial losses and reputational harm.

A proper Internet Performance Monitoring strategy is crucial for AI implementations – with 36% or organizations citing this as the mechanism for knowing when their AI is either down or slow - as it helps detect and resolve issues promptly. For instance, 57% of respondents recognize the impact immediately when AI supporting Tier 1 apps becomes unavailable or slower. This highlights the importance of robust monitoring to maintain AI efficiency and prevent operational disruptions, ensuring continuous business success.

The AI arms race is on

Failing to invest in AI is no longer a neutral decision—it’s a risk.

How will your organization's AI investment for reliability or resilience change in the next 18 months?

  • AI investments will substantially increase 41%
  • AI investments will moderately increase 44%
  • AI investments will remain about the same 10%
  • AI investments will moderately decrease 3%
  • AI investments will substantially decrease 1%

Organizations should prioritize AI investments over the next 18 months to avoid the negative opportunity cost of inaction. Failing to invest in AI can lead to inefficiencies, competitive disadvantages, and missed growth opportunities.

85%

of organizations will increase their AI investments
over the next 18 months

...with only 4% expecting to decrease their AI spending. This trend underscores the critical role AI plays in enhancing reliability and resilience. By investing in AI, businesses can ensure robust performance, mitigate risks, and drive innovation, securing their position in an increasingly AI-driven market.

AI: The backbone of critical application resilience

Proof that organizations should not go it alone

Which statement(s) describe your organization's current approach to AI for ensuring critical Tier 1 application resilience?

  • We have purchased third-party AI capabilities to ensure Tier 1 application resilience 60%
  • We are evaluating whether to add third-party AI capabilities to ensure Tier 1 application resilience 59%
  • We have built our own set of AI capabilities to ensure Tier 1 application resilience 54%
  • We are evaluating whether to add our own AI capabilities to ensure Tier 1 application resilience 49%

Further to the critical dependencies on third parties, most organizations rely on third-party AI capabilities – especially third-party AI capabilities for ensuring critical Tier 1 application resilience. This approach is favored over building in-house solutions due to the expertise, scalability, and cost-effectiveness offered by third-party providers.

59%

organizations are evaluating third-party AI capabilities, while 60% have already purchased them.

In contrast, 49% are considering adding their own AI, and 54% have built their own. The reliance on third-party AI allows businesses to leverage advanced technologies without the significant investment and time required for developing proprietary solutions, ensuring faster and more reliable resilience measures.

Conclusion

Let’s call it what it is: resilience is no longer a background process. It’s the main event. The findings of this report speak for themselves. Websites that merely “stay up” don’t cut it anymore.

  • 73% of businesses say fast, high-performing websites are critical to business success.
  • 42% claim that if apps are slow, then they might as well be down.
  • More than half of the surveyed companies are bleeding over $1 million a month when things go wrong.

That’s not downtime. That’s damage.

The lesson? If you’re not actively investing in purpose-built Internet Performance Monitoring, you’re flying blind. If you’re still relying on broad, one-size-fits-all tools, you're solving a modern problem with yesterday’s kit. And if AI isn't part of your resilience story yet, it will be—whether by choice or by consequence.

Building a resilient Internet Stack is not just about preventing downtime; it's about ensuring every digital interaction is seamless and engaging, driving long-term growth and cementing customer loyalty. That means best-of-breed IPM tools, smart use of AI, and a robust Internet Stack that’s monitored from the outside in.

The businesses that invest in Internet Resilience now won't just stay online—they'll stay ahead.

Demographics

Company Size
1% Less than 500
29% 500–1,000
48% 1,000–10,000
14% 10,000–100,000
8% More than 100,000

Managerial Responsibility
34% Manager
41% Director
25% Vice President/Executive Leader

Respondents
N = 475
FEB–MAR, 2025
Industries
66% Tech platform "as a service" provider
14% Manufacturing
13% Financial services
11% Retail/Ecommerce
9% Healthcare
6% News/Media
5% Other

Location
65% North America
26% Europe
7% South America
1% Asia
1% Oceania
This is some text inside of a div block.
What is ECN?

Explicit Congestion Notification (ECN) is a longstanding mechanism in place on the IP stack to allow the network help endpoints "foresee" congestion between them. The concept is straightforward… If a close-to-be-congested piece of network equipment, such as a middle router, could tell its destination, "Hey, I'm almost congested! Can you two guys slow down your data transmission? Otherwise, I’m worried I will start to lose packets...", then the two endpoints can react in time to avoid the packet loss, paying only the price of a minor slow down.

What is ECN bleaching?

ECN bleaching occurs when a network device at any point between the source and the endpoint clears or “bleaches” the ECN flags. Since you must arrive at your content via a transit provider or peering, it’s important to know if bleaching is occurring and to remove any instances.

With Catchpoint’s Pietrasanta Traceroute, we can send probes with IP-ECN values different from zero to check hop by hop what the IP-ECN value of the probe was when it expired. We may be able to tell you, for instance, that a domain is capable of supporting ECN, but an ISP in between the client and server is bleaching the ECN signal.

Why is ECN important to L4S?

ECN is an essential requirement for L4S since L4S uses an ECN mechanism to provide early warning of congestion at the bottleneck link by marking a Congestion Experienced (CE) codepoint in the IP header of packets. After receipt of the packets, the receiver echoes the congestion information to the sender via acknowledgement (ACK) packets of the transport protocol. The sender can use the congestion feedback provided by the ECN mechanism to reduce its sending rate and avoid delay at the detected bottleneck.

ECN and L4S need to be supported by the client and server but also by every device within the network path. It only takes one instance of bleaching to remove the benefit of ECN since if any network device between the source and endpoint clears the ECN bits, the sender and receiver won’t find out about the impending congestion. Our measurements examine how often ECN bleaching occurs and where in the network it happens.

Why is ECN and L4S in the news all of a sudden?

ECN has been around for a while but with the increase in data and the requirement for high user experience particularly for streaming data, ECN is vital for L4S to succeed, and major investments are being made by large technology companies worldwide.

L4S aims at reducing packet loss - hence latency caused by retransmissions - and at providing as responsive a set of services as possible. In addition to that, we have seen significant momentum from major companies lately - which always helps to push a new protocol to be deployed.

What is the impact of ECN bleaching?

If ECN bleaching is found, this means that any methodology built on top of ECN to detect congestion will not work.

Thus, you are not able to rely on the network to achieve what you want to achieve, i.e., avoid congestion before it occurs – since potential congestion is marked with Congestion Experienced (CE = 3) bit when detected, and bleaching would wipe out that information.

What are the causes behind ECN bleaching?

The causes behind ECN bleaching are multiple and hard to identify, from network equipment bugs to debatable traffic engineering choices and packet manipulations to human error.

For example, bleaching could occur from mistakes such as overwriting the whole ToS field when dealing with DSCP instead of changing only DSCP (remember that DSCP and ECN together compose the ToS field in the IP header).

How can you debug ECN bleaching?

Nowadays, network operators have a good number of tools to debug ECN bleaching from their end (such as those listed here) – including Catchpoint’s Pietrasanta Traceroute. The large-scale measurement campaign presented here is an example of a worldwide campaign to validate ECN readiness. Individual network operators can run similar measurement campaigns across networks that are important to them (for example, customer or peering networks).

What is the testing methodology?

The findings presented here are based on running tests using Catchpoint’s enhanced traceroute, Pietrasanta Traceroute, through the Catchpoint IPM portal to collect data from over 500 nodes located in more than 80 countries all over the world. By running traceroutes on Catchpoint’s global node network, we are able to determine which ISPs, countries and/or specific cities are having issues when passing ECN marked traffic. The results demonstrate the view of ECN bleaching globally from Catchpoint’s unique, partial perspective. To our knowledge, this is one of the first measurement campaigns of its kind.

Beyond the scope of this campaign, Pietrasanta Traceroute can also be used to determine if there is incipient congestion and/or any other kind of alteration and the level of support for more accurate ECN feedback, including if the destination transport layer (either TCP or QUIC) supports more accurate ECN feedback.

The content of this page is Copyright 2024 by Catchpoint. Redistribution of this data must retain the above notice (i.e. Catchpoint copyrighted or similar language), and the following disclaimer.

THE DATA ABOVE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS OR INTELLECTUAL PROPERTY RIGHT OWNERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THIS DATA OR THE USE OR OTHER DEALINGS IN CONNECTION WITH THIS DATA.

We are happy to discuss or explain the results if more information is required. Further details per region can be released upon request.