Subscribe to our
weekly update
Sign up to receive our latest news via a mobile-friendly weekly email
The financial services industry is undergoing rapid digital transformation. Traditional on-premises self-contained systems have given way to cloud-based, distributed architectures built on APIs, SaaS platforms, and interconnected ecosystems. This shift has unlocked new agility and innovation—but it’s also introduced complexity, fragile dependency chains, external risks, and visibility gaps that traditional monitoring tools were never designed to handle.
Application Performance Monitoring (APM) remains essential for observing custom applications. But on its own, APM is no longer sufficient to ensure the resilience of complex distributed systems, third-party APIs, cloud services, and internet pathways that now define how services are delivered and experienced.
This report explores how leading financial institutions are expanding their monitoring strategies with Internet Performance Monitoring (IPM)—a complementary, outside-in approach that provides real-time insight into what users, systems, and regulators truly care about: experience, reachability, reliability, and performance.
By combining IPM with traditional APM, IT leaders can close critical visibility gaps, anticipate outages before they escalate, and ensure their digital services consistently meet rising customer and regulatory expectations. In today’s high-stakes digital environment, this dual approach isn’t just smart—it’s strategic.
Over the past 15 years, financial institutions have shifted much of their infrastructure to the cloud, adopted more FinTech solutions, and integrated with other cloud-based systems—creating complex, interdependent environments spanning multiple locations. This cloud migration provides greater flexibility, scalability, and cost-efficiency, but it also introduces new complexities and dependencies that must be carefully managed.
Cloud services enable firms to leverage advanced technologies (artificial intelligence, machine learning, big data analytics, etc.) to enhance operations and customer experiences. However, these services also create additional layers of complexity and potential points of failure that traditional monitoring approaches struggle to observe. The distributed nature of cloud services—often spanning multiple providers and regions—further complicates the monitoring landscape and requires a more sophisticated approach to ensure digital resilience.
To ensure continuous service delivery and optimal user experience, financial institutions must monitor more than just uptime. True digital resilience hinges on four critical dimensions:
This resilience formula provides a practical framework for evaluating the health of digital services from the perspective of the users consuming an application, not the infrastructure or the systems themselves.
Ensuring digital user experience, or Internet Resilience, requires more than focusing on system health or application code efficiency. The opportunity is to understand these four factors of resilience from the user perspective.
Banking infrastructure has undergone a dramatic transformation over the past two decades, evolving from centralized, somewhat monolithic systems to complex, distributed architectures that span multiple environments, clouds, and services, and integrate with legacy components. Traditionally, banks operated within self-contained environments where applications ran on centralized mainframes and critical functions were housed within a single, controlled infrastructure.
These legacy systems, while stable, often created significant barriers to innovation, requiring lengthy development cycles and limiting banks' ability to respond quickly to market demands. The inflexibility of these monolithic architectures became increasingly problematic as customer expectations shifted toward digital-first experiences, forcing banks to reimagine their technological foundations.
This evolution has not been merely incremental but represents a fundamental restructuring of how banking systems operate and deliver services.
Today's banking landscape is characterized by highly distributed components that operate across on-premises data centers, private clouds, public clouds, and edge locations.
This shift has been driven by the need for greater agility, scalability, and the ability to deliver innovative digital experiences to increasingly demanding customers. Banking systems have become remarkably complex, with significant reliance on Software-as-a-Service (SaaS) applications and partner APIs for various critical functions, including loan origination, transfers, trading, core banking, mobile applications, security, and fraud prevention.
The transition to distributed systems has enabled banks to break down silos, implement more agile development practices, and deliver new features and services at a pace that would have been impossible under the previous model. However, this evolution has also introduced new challenges in terms of monitoring, managing, and ensuring the resilience of these complex environments.
The adoption of APIs and cloud services represents a pivotal aspect of the financial services infrastructure evolution. APIs have emerged as the essential connective tissue of modern banking, enabling different systems, applications, and services to communicate and share data seamlessly. They allow banks to integrate with external providers, leverage third-party services, and create a more interconnected ecosystem that extends far beyond the boundaries of the institution's own infrastructure.
This API-centric approach has revolutionized everything from payment processing to account opening, enabling banks to compose sophisticated customer journeys from a combination of internal and external services. However, it also introduces new layers of complexity. Each customer interaction may depend on dozens of interconnected services—many outside the bank’s direct control—creating operational blind spots and new challenges for monitoring and resilience.
The culmination of these changes is that banking systems have become complex, distributed, cloud-centric, service-oriented, and dependent on dozens or hundreds of external connections that must all function properly for the bank to operate effectively.
This complexity is further intensified by the increasing reliance on mobile banking, mobile payments, and third-party payment applications (such as Stripe, Zelle, and Venmo) that must deliver reliable and fast experiences to meet growing customer expectations.
The stakes could not be higher: 51% of banking customers switch providers due to poor digital experiences, and 83% would consider switching after just one negative interaction.
However, traditional monitoring approaches, which were designed for simpler, more centralized architectures, often struggle to provide complete visibility into these complex, distributed environments. This complexity presents significant challenges for IT operations teams responsible for ensuring the reliability, performance, and security of banking systems. As banking infrastructure continues to evolve and become more complex, so too must the approaches to monitoring and ensuring digital resilience.
As a direct result of this API-driven evolution, banks now face the challenge of managing a vast and fragile web of external dependencies. Each customer transaction might traverse dozens of third-party APIs, payment processors, SaaS platforms, and even AI-powered services like ChatGPT or CoPilot. Every external integration is a potential point of failure—one that is often outside the bank’s direct control but still impacts the customer experience.
This interconnectedness means that a disruption in a single provider can cascade across multiple banking services, amplifying risk. For example, a latency spike in a partner API can delay loan approvals or payment processing, while an outage in a cloud provider can take down critical customer-facing applications. Traditional monitoring tools, designed for simpler environments, often detect only the symptoms—not the root cause—of these failures, leading to longer outages and higher operational costs.
This dependency can lead to increased mean time to resolution (MTTR), extended service disruptions, and ultimately, a diminished customer experience that may drive customers to seek banking services elsewhere.
The complexity of these dependency chains also creates challenges for change management and risk assessment. When making changes to banking systems or introducing new features, banks must consider:
Without a clear understanding of these relationships:
As banking systems continue to become more interconnected and dependent on external services, managing and monitoring these complex dependency chains will be increasingly crucial for maintaining service reliability and customer satisfaction.
While APM remains a foundational component of IT operations, it was designed for an era of centralized infrastructure and internal control. These tools continue to deliver value when it comes to monitoring internal performance metrics, application health, and system diagnostics. However, in modern banking environments—where services rely on third-party APIs, cloud platforms, and internet-based delivery mechanisms—APM alone can no longer provide the full picture.
This imbalance creates critical visibility gaps that leave banks vulnerable to service degradation, customer complaints, and compliance risks.
This interconnectedness means that a disruption in a single provider can cascade across multiple banking services, amplifying risk. For example, a latency spike in a partner API can delay loan approvals or payment processing, while an outage in a cloud provider can take down critical customer-facing applications. Traditional monitoring tools, designed for simpler environments, often detect only the symptoms—not the root cause—of these failures, leading to longer outages and higher operational costs.
Despite their value in tracking internal systems, traditional APM tools fall short in one critical area: monitoring the actual digital experience of real users. These tools typically gather data from a limited number of cloud-based locations, which are often far removed from the environments where customers, employees, partners, or integrated systems actually access services.
This creates a fundamental disconnect:
As banks continue to serve increasingly diverse and distributed user bases, bridging this gap in visibility is no longer optional—it’s essential for maintaining service quality and customer trust.
This broader approach enables IT and ops teams to see what’s happening across the entire delivery chain—not just where their code lives, but where real users experience it.
Banks that fail to deliver seamless digital experiences risk losing customers to more agile, tech-savvy competitors. Research shows that digital experience is now a primary driver of customer satisfaction and loyalty. When digital services are slow, unreliable, or difficult to use, customers are quick to look elsewhere. The relationship between digital experience and customer retention highlights the business impact of effective monitoring and the potential costs of missing issues until it’s too late.
Customer expectations continue to rise: today’s users want mobile and web applications to respond in 2–3 seconds and payments to process instantly. Banks that can’t meet these standards risk frustrating their customers and damaging their reputation. Outages and slowdowns not only inconvenience users but can also have a direct impact on people’s lives and financial wellbeing.
Poor digital experience monitoring also leads to higher operational costs. Without comprehensive visibility, banks often resort to “war room” tactics—pulling together large teams to troubleshoot problems reactively. This approach is time-consuming, resource-intensive, and expensive compared to proactive monitoring that identifies and resolves issues before they affect customers.
As banking becomes increasingly digital-first, the operational and business costs of poor digital experience monitoring will only continue to grow, making comprehensive monitoring not just a technical necessity but a business imperative for forward-thinking financial institutions.
Internet Performance Monitoring (IPM) represents a specialized monitoring discipline focused on measuring the performance, reliability, and overall quality of internet connections, as well as the performance of web applications and services delivered over the internet.
"IPM typically includes monitoring the speed, reliability, and overall quality of internet connections, as well as the performance of web applications and services offered over the internet." Source: Gartner Digital Experience Monitoring Critical Capabilities
Unlike traditional APM tools that primarily monitor internal infrastructure and applications, IPM is focused on the Internet Stack, the collection of technologies, protocols, and systems that make today’s digital systems possible, while providing an outside-in perspective that offers real-time insights into the actual experience delivered to users.
IPM represents a necessary evolution in monitoring approaches, addressing the blind spots that have emerged as banking infrastructure has become more distributed, cloud-centric, and dependent on internet connectivity. Gartner research has recognized the importance of this discipline, with Catchpoint noted as "the highest rated vendor overall for internet performance monitoring," reflecting the growing recognition of IPM as a critical component of comprehensive monitoring strategies.
As banks continue to expand their digital offerings and serve increasingly diverse and geographically distributed customer bases, the insights provided by IPM become increasingly valuable for maintaining service quality and competitive advantage.
Here is a summary of the most impactful use cases for IPM in banking, highlighting the specific business challenges, the unique value it provides, and the tangible benefits for financial institutions.
These use cases demonstrate that Internet Performance Monitoring is not just a technical upgrade—it’s a strategic necessity for banks seeking to deliver reliable digital experiences, minimize risk, and maintain a competitive edge in an increasingly complex environment.
Here is a summary of the most impactful use cases for IPM in banking, highlighting the specific business challenges, the unique value it provides, and the tangible benefits for financial institutions.
A major advantage of IPM is its ability to transform monitoring from a reactive, firefighting exercise into a proactive, preventative discipline. Traditional approaches often rely on customer complaints or support tickets to signal problems, meaning issues are only addressed after they have already impacted users and business outcomes. This reactive model leads to extended resolution times, unnecessary strain on IT resources, and reputational damage as banks scramble to identify and fix issues after the fact.
With IPM, banks gain the ability to detect emerging issues at their earliest stages—often before customers or employees are even aware of them. By continuously monitoring the digital experience from the outside in, operations teams can spot warning signs, investigate root causes, and resolve problems before they escalate. This proactive stance not only reduces the duration and reach of incidents but also allows for more efficient resource allocation and minimizes the business impact of disruptions.
The difference between reactive and proactive digital experience monitoring is illustrated in Figure 6. As shown, reactive monitoring typically detects issues only after customers are impacted, leading to escalated complaints and loyalty loss. In contrast, proactive monitoring enables early detection and resolution, dramatically reducing both the severity and the overall business impact of incidents.
Traditional monitoring often focuses on technical Service Level Agreements (SLAs) that do not always reflect the actual user experience. Metrics like server CPU utilization or network latency, while important, may not correlate with how customers or employees perceive digital banking services. IPM enables banks to shift focus to Experience Level Objectives (XLOs)—metrics that directly measure the quality of digital experience from the user’s perspective.
XLOs bridge the gap between IT and business, aligning performance monitoring with outcomes that matter most: how quickly a trader can complete a transaction, how long a teller waits for a process to finish, or how fast a customer’s mobile app responds. These experience-level metrics are directly tied to customer satisfaction, employee productivity, and business success. For example, a bank may set an XLO that the login page must load in under three seconds, or that a payment transfer via API must complete in less than 500 milliseconds and be available 99.99% of the time.
This focus ensures that monitoring efforts are aligned with business goals and customer expectations, providing a shared language for IT and business leaders to drive performance improvements and competitive advantage.
Implementing IPM delivers significant cost savings by reducing the frequency and impact of service disruptions. Proactive detection and resolution of issues mean fewer large-scale incidents, less time spent in “war room” troubleshooting, and lower support costs as fewer customers experience problems. This operational efficiency translates directly into financial savings and improved profitability.
The business case is further strengthened by the impact on customer retention. Since acquiring new customers is far more expensive than retaining existing ones, delivering a consistently high-quality digital experience produces a substantial return on investment.
IPM also helps banks meet the monitoring and resilience expectations set by regulatory frameworks such as DORA and the UK’s Operational Resilience Regulation, as discussed earlier in this report. By providing comprehensive, real-time visibility across all digital services—including third-party dependencies—IPM enables banks to identify vulnerabilities, address them proactively, and demonstrate compliance to regulators, reducing the risk of fines, legal costs, and reputational damage.
Mix-and-match monitoring tools create fragmented visibility, complexity, and higher costs. Consolidating APM and IPM into a unified observability platform delivers improved visibility, faster incident response, simplified management, and reduced operational costs.
APM and IPM are complementary, not competing, approaches. When combined, they provide both deep internal diagnostics and an outside-in view of user experience—ensuring banks can identify and resolve issues wherever they occur.
Table 3 illustrates how APM and IPM complement each other, and why integrating both is essential for achieving complete visibility and control over digital banking operations.
As financial services accelerate toward digital-first delivery, the challenge is no longer just uptime—it’s experience, resilience, and trust. Legacy monitoring tools, built for static infrastructure, can’t keep pace with the dynamic, distributed, and internet-dependent ecosystems that define modern banking.
This paper has shown how Internet Performance Monitoring addresses the critical blind spots left by traditional Application Performance Monitoring (APM). While APM provides deep insight into internal systems and code-level performance, only IPM can reveal how services perform for real users, across real networks, in real time.
Together, these capabilities form the foundation of a comprehensive monitoring strategy—one that aligns technical performance with business outcomes, accelerates incident response, strengthens regulatory readiness, and protects both customer experience and operational continuity.
Banks that integrate APM and IPM don’t just detect problems faster—they become proactive, predictive, and experience-led.
As regulators demand greater operational resilience, and as customers demand seamless, instant experiences across every touchpoint, the message is clear: visibility isn’t a technical upgrade—it’s a strategic necessity.
Explicit Congestion Notification (ECN) is a longstanding mechanism in place on the IP stack to allow the network help endpoints "foresee" congestion between them. The concept is straightforward… If a close-to-be-congested piece of network equipment, such as a middle router, could tell its destination, "Hey, I'm almost congested! Can you two guys slow down your data transmission? Otherwise, I’m worried I will start to lose packets...", then the two endpoints can react in time to avoid the packet loss, paying only the price of a minor slow down.
ECN bleaching occurs when a network device at any point between the source and the endpoint clears or “bleaches” the ECN flags. Since you must arrive at your content via a transit provider or peering, it’s important to know if bleaching is occurring and to remove any instances.
With Catchpoint’s Pietrasanta Traceroute, we can send probes with IP-ECN values different from zero to check hop by hop what the IP-ECN value of the probe was when it expired. We may be able to tell you, for instance, that a domain is capable of supporting ECN, but an ISP in between the client and server is bleaching the ECN signal.
ECN is an essential requirement for L4S since L4S uses an ECN mechanism to provide early warning of congestion at the bottleneck link by marking a Congestion Experienced (CE) codepoint in the IP header of packets. After receipt of the packets, the receiver echoes the congestion information to the sender via acknowledgement (ACK) packets of the transport protocol. The sender can use the congestion feedback provided by the ECN mechanism to reduce its sending rate and avoid delay at the detected bottleneck.
ECN and L4S need to be supported by the client and server but also by every device within the network path. It only takes one instance of bleaching to remove the benefit of ECN since if any network device between the source and endpoint clears the ECN bits, the sender and receiver won’t find out about the impending congestion. Our measurements examine how often ECN bleaching occurs and where in the network it happens.
ECN has been around for a while but with the increase in data and the requirement for high user experience particularly for streaming data, ECN is vital for L4S to succeed, and major investments are being made by large technology companies worldwide.
L4S aims at reducing packet loss - hence latency caused by retransmissions - and at providing as responsive a set of services as possible. In addition to that, we have seen significant momentum from major companies lately - which always helps to push a new protocol to be deployed.
If ECN bleaching is found, this means that any methodology built on top of ECN to detect congestion will not work.
Thus, you are not able to rely on the network to achieve what you want to achieve, i.e., avoid congestion before it occurs – since potential congestion is marked with Congestion Experienced (CE = 3) bit when detected, and bleaching would wipe out that information.
The causes behind ECN bleaching are multiple and hard to identify, from network equipment bugs to debatable traffic engineering choices and packet manipulations to human error.
For example, bleaching could occur from mistakes such as overwriting the whole ToS field when dealing with DSCP instead of changing only DSCP (remember that DSCP and ECN together compose the ToS field in the IP header).
Nowadays, network operators have a good number of tools to debug ECN bleaching from their end (such as those listed here) – including Catchpoint’s Pietrasanta Traceroute. The large-scale measurement campaign presented here is an example of a worldwide campaign to validate ECN readiness. Individual network operators can run similar measurement campaigns across networks that are important to them (for example, customer or peering networks).
The findings presented here are based on running tests using Catchpoint’s enhanced traceroute, Pietrasanta Traceroute, through the Catchpoint IPM portal to collect data from over 500 nodes located in more than 80 countries all over the world. By running traceroutes on Catchpoint’s global node network, we are able to determine which ISPs, countries and/or specific cities are having issues when passing ECN marked traffic. The results demonstrate the view of ECN bleaching globally from Catchpoint’s unique, partial perspective. To our knowledge, this is one of the first measurement campaigns of its kind.
Beyond the scope of this campaign, Pietrasanta Traceroute can also be used to determine if there is incipient congestion and/or any other kind of alteration and the level of support for more accurate ECN feedback, including if the destination transport layer (either TCP or QUIC) supports more accurate ECN feedback.