Learn

Value-Based Observability

Executive summary

For over three decades, enterprises have invested in APM, infrastructure monitoring, and more recently, observability and AIOps, yet major outages still dominate headlines. Even organizations with sophisticated telemetry pipelines often find themselves blindsided by digital performance issues that directly impact customers or employees.

Why? Because while most organizations have improved visibility, few have matured their observability practices into a true business capability.

Organizations must evolve by:

  • Moving from reactive monitoring to proactive management and resilience
  • Shifting from technology-focused to value-focused practices
  • Transforming from IT systems innocence to business value accountability

This white paper provides a roadmap for this transformation, highlighting a maturity model designed to help enterprises evolve from tactical, siloed monitoring to value-based observability, where performance data drives business resilience, customer experience, and strategic decisions. Leaders who embrace this approach can anticipate risks, optimize user experiences, and measure IT success by its impact on revenue and customer satisfaction.

The opportunity to evolve observability

Most IT operations teams today are stuck in what could be called “data-rich, insight-poor observability.” They’ve invested heavily in APM, logs, traces, and metrics — and are collecting terabytes of data about internal systems every day. But that data mostly tells them how their systems are behaving, not how their users or the business are performing.

For the observability leader, this shift is more than technical; rather, it’s career-defining. Leading the journey to value-based observability elevates their role from operations management to business leadership, positioning them as the person who not only keeps systems running but also drives innovation, customer satisfaction, and competitive advantage.

The value-based observability maturity model

This model was created using a combination of real-world insights from leading observability teams and their transformations over the past few years.

STAGES TO VALUE-BASED OBSERVABILITY

The four pillars of resilience

After the COVID-19 pandemic, the IT world started focusing on resilience, which has evolved from a top business objective to a buzzword. To regain its value, let’s define it:

Resilience is the ability of digital services to withstand, adapt to, and recover quickly from disruptions—whether they originate inside your systems or anywhere across the Internet.51% of banking customers citing it as their primary reason.

Source: Salesforce Financial Services Research Report

Operationally, resilience means:

  • Withstand: Your systems and dependencies (DNS, CDN, APIs, ISPs, SaaS, etc.) can withstand failures without users noticing.
  • Adapt: When a disruption occurs, such as a regional CDN outage or BGP leak, the system automatically reroutes or adjusts to maintain acceptable performance.
  • Recover: When failures can’t be fully avoided, teams detect them fast, isolate the impact, and restore normal operations with minimal downtime or user impact.

The four pillars:

Resilience is about much more than uptime. To achieve resilience, it is useful to think about the four pillars that quantify digital resilience:

  • Reachability – Can users and services access your apps from everywhere?
  • Availability – Are digital services up and functional across networks and geographies?
  • Performance – Are they fast and responsive under real-world conditions?
  • Reliability – Are they consistent and trustworthy over time?

Together, these four pillars form the foundation of digital resilience. In the following sections, we’ll explore how organizations can mature through progressive stages from a basic technology focus to value-based observability built upon these principles.

Stage 1: Technology Focused — Building Minimum Viable Observability

Every organization begins with a technology focus. At this stage, the culture is informal, teams are siloed, and monitoring is mostly reactive. The techniques are ad hoc: manual checks, low situational awareness, and “war-room” firefighting dominate the operational mindset.

  • Structure: Informal, siloed operations.
  • Techniques: Manual troubleshooting, reactive incident response.
  • Outcomes: Infrastructure-level KPI tracking, detect-and-restore capabilities.

This is where most organizations establish their “minimum viable observability” — enough monitoring to detect problems and restore service. But these environments often rely heavily on tool proliferation rather than integration. Data lives in separate silos (APM, network monitoring, synthetic testing), and there’s no unifying view of end-to-end performance.

It is surprising that, in 2025, many organizations still operate with a technology-focused mindset. They might invest in AI-Ops and fancy dashboards; however, if their main focus is to prove the infrastructure they are responsible for is not at fault, or if the goal is innocence, then they have not escaped this stage.

Stage 2: System Focused — From Reactive to Proactive

The next stage introduces structure and learning. Teams begin integrating systems and standardizing data collection. The focus shifts from firefighting to preemptive resolution, reducing mean time to identify (MTTI) and mean time to repair (MTTR).

  • Structure: Learning culture, integrated operations.
  • Techniques: Preemptive resolution, good situational awareness, proactive monitoring.
  • Outcomes: System-level objectives, measurable improvements, repeatable processes.

At this level, observability becomes systemic rather than opportunistic. Organizations begin correlating telemetry across applications, infrastructure, and networks. Synthetic and real user monitoring are used together to test critical transactions and proactively detect degradation before customers notice.

Internet Performance Monitoring (IPM) becomes essential here because it connects these applications to the broader internet layer, including DNS, SSL, routing, authentication APIs, WAN, and third-party APIs, cloud services, and so on. A financial services IT organization described this as “the turning point where we stopped troubleshooting and started predicting.”

Stage 3: Experience Focused — Unifying Observability Around the User

As organizations mature, observability expands beyond systems into the experience layer. The key question shifts from “Is the system up?” to “Is the user experience great?”

  • Structure: Open, collaborative culture; outcome-based operations.
  • Techniques: Predictive analytics, quantitative management, high situational awareness.
  • Outcomes: Unified operations mission, resilience across domains, full-stack insights.

Experience-focused observability unites SRE, network, and digital teams around a shared mission: delivering frictionless experiences for users, whether they’re customers on your website or employees on internal SaaS platforms.

Here, observability becomes predictive. Machine learning, APM, NPM, and IPM data are combined to forecast degradation, understand the performance of each component under load, and quantify experience in terms of business impact.

The maturity framework describes this stage as one where broader performance monitoring is embedded into digital operations, integrating visibility from backbone to browser, from API to end user. The data informs not just IT response, but also CX, marketing, and business continuity decisions.

In this stage, the IT function evolves from a technology function to a critical business partner that enables the business and is responsible for all digital interactions across the organization.

Stage 4: Value Focused — From Insight to Business Differentiation

The final stage, value-based observability, represents a cultural and operational shift. Observability becomes a driver of business value, innovation, and continuous improvement.

  • Structure: Continuous improvement and innovation-led culture.
  • Techniques: digital experience-driven business decisions, proactive optimization.
  • Outcomes: Optimized user experiences, brand protection, productivity gains, resilience.

At this level, observability is tightly coupled with business outcomes. Organizations use predictive insights not only to detect or prevent issues but also to improve customer retention, employee productivity, and brand reputation.

GigaOm Research defines this stage as one in which observability aligns directly with strategic business goals using IPM and AI/ML analytics to drive continuous optimization across digital experiences.

For example, one global retailer used IPM insights to identify intermittent latency in a third-party API during high-traffic periods. By addressing it proactively, they avoided checkout failures during Black Friday, preserving millions in revenue and reinforcing customer trust.

Charting the path from monitoring to mastery

Taken together, the four stages form a clear maturity continuum from reactive, technology-driven monitoring to proactive, value-driven observability. Each stage builds on the last, strengthening organizational resilience and aligning performance data with business success. The next section delves into how these distinctions reshape traditional approaches and establish a new standard for observability excellence.

Transitioning beyond the maturity stages

The journey through the four stages lays a foundation for a transformed approach to operations, but what does that transformation look like in practice? As organizations progress, they begin to redefine not just their tools and workflows, but their very mindset toward digital performance. This next section explores how value-based observability diverges from traditional practices, highlighting the principles that distinguish modern, business-aligned observability.

How value-based observability fundamentally differs from the status quo

1. From system-centric to outcome-centric: Traditional APM-driven observability focuses on system health, including CPU utilization, query latency, and error rates. These are useful, but they stop short of showing whether the business is healthy. Value-based observability reorients telemetry around business outcomes and user experiences, whether customers can complete a checkout, employees can access SaaS apps, or APIs are delivering value in real time. Instead of dashboards full of green lights, the question becomes: Is the experience great, and is it driving value?

2. From internal visibility to end-to-end digital awareness: APM tells you what’s happening inside your code. But today’s user experience depends on the broader Internet as your new production environment, your application fabric networks, APIs, microservices, and internal and external cloud services. Value-based observability uses to close that visibility gap. It measures from the real-world user experience, tracking reachability, availability, and performance across the full digital supply chain, not just your app servers.

3. From noise to narrative: Average teams drown in telemetry because every tool produces its own flood of data. Without context, all that data becomes white noise, more heat than light. Value-based observability uses correlation, context, and business metrics to create a coherent narrative: what’s happening, why it matters, and who’s affected. As one observability leader put it, “We went from reacting to graphs to explaining stories, the ‘why’ behind the metrics.”

4. From firefighting to foresight: Most operations teams still operate in a reactive posture: detect, diagnose, and repair. They’re judged on MTTR rather than issue avoidance. And they spend too much time in very disruptive war rooms. Value-based observability uses proactive monitoring and real-time data to get ahead of outages, minimize their severity and duration, and prevent customer impact altogether. The shift is from “monitoring for problems” to engineering for resilience.

5. From IT cost center to strategic partner: In traditional observability, success is measured in uptime and alert resolution. In value-based observability, success is measured in customer satisfaction, revenue protection, and brand trust.

When observability connects directly to digital business KPIs, IT operations leaders are no longer back-office operators. They become business enablers.

And that’s where the personal upside emerges: The observability leader who drives this evolution earns a strategic voice at the executive table. They can demonstrate, with concrete data, how observability investments improve customer conversion, retention, and productivity. That’s a career shift from keeping systems running to keeping the business winning.

Why IPM Is the Enabler of Value-Based Observability

The transition from “minimum viable observability” to “value-based observability” hinges on a simple but powerful realization: you can’t manage what you can’t see across the internet.

In today’s digital-first world, every click, API call, and DNS lookup is part of a complex, interdependent supply chain that sits mostly outside your direct control. Cloud, CDN, SaaS, and third-party APIs make up the backbone of your business. The failure of any link can ripple across digital experiences instantly.

Traditional observability focuses on infrastructure and code, including the metrics, logs, and traces they produce. But as GigaOm notes, the digital experience now depends just as much on dependencies beyond application code, including APIs, cloud services, DNS and SSL protocols, SASE and Auth, ISPs, edge services, and many other layers we can refer to as the Internet Stack.

IPM closes this visibility gap, providing an external perspective that complements internal observability.

By measuring experience from the user’s perspective (via internet synthetics and RUM) and the internet’s perspective (via global vantage points that reflect real-world user experience), IPM turns observability into a strategic differentiator.

In this context it is critical to keep in mind that monitoring from a cloud datacenter is not an effective way to measure digital experience. Monitoring exclusively from hyperscaler environments misses essential real-world variables like last-mile congestion, ISP routing, CGNAT, local DNS resolution, and edge/cache viability. These are not theoretical gaps; they materially affect how users experience applications every day.

At the highest maturity levels, IPM data is not just consumed by IT, but it informs marketing (campaign readiness), product (feature performance), and business leadership (risk and brand protection).

Measuring and advancing maturity

The original IPM maturity model was developed by Constellation Research in combination with Catchpoint. We’ve since evolved and refined it over the years, based on practical experience and customer outcomes. Lately, it has been further developed by GigaOM, with a greater focus on elevating IT operations.

According to GigaOm, assessing your maturity level begins with four key steps:

  • Identify key domains: Network, application, website, workforce, and customer experiences.
  • Evaluate coverage: Assess observability across performance, availability, experience, and resilience.
  • Determine current vs. desired maturity: Benchmark existing practices against the model’s four stages.
  • Develop a roadmap: Define actionable steps to evolve, ensuring at least minimum viable observability in all domains while progressing toward value-based observability.

Catchpoint’s own IPM Maturity Assessment builds on this framework, using quantitative scoring across solutions and resilience pillars. The results produce a maturity index, a data-driven baseline for continuous improvement.

The payoff: resilience, experience, and business alignment

Organizations that evolve their observability through IPM maturity gain measurable advantages:

  • Faster detection and resolution: Shifting from reactive firefighting to predictive prevention.
  • Unified visibility: A single operational mission across network, app, and experience domains.
  • Improved customer experience: Reduced friction and higher satisfaction across digital touchpoints.
  • Enhanced resilience: Proactive detection of internet-level risks before they impact business.
  • Business alignment: Observability data that speaks the language of outcomes — revenue, brand trust, and workforce productivity.

GigaOm summarizes it well: at the value-based stage, observability is not an IT function — it’s a business function. It delivers strategic value by ensuring reliability, speed, and trust across the digital supply chain.

Conclusion: the journey to value-based observability

Achieving value-based observability isn’t about deploying more tools. It’s about connecting the dots between performance, experience, and business impact.

The Catchpoint and GigaOm maturity model gives technology leaders a framework for that journey:

  • From silos to systems.
  • From systems to experiences.
  • From experiences to business value.

Internet Performance Monitoring is the bridge across those stages. It turns observability from a tactical necessity into a strategic advantage. It’s the key to operational excellence and digital resilience in an internet-dependent world.

This is some text inside of a div block.
What is ECN?

Explicit Congestion Notification (ECN) is a longstanding mechanism in place on the IP stack to allow the network help endpoints "foresee" congestion between them. The concept is straightforward… If a close-to-be-congested piece of network equipment, such as a middle router, could tell its destination, "Hey, I'm almost congested! Can you two guys slow down your data transmission? Otherwise, I’m worried I will start to lose packets...", then the two endpoints can react in time to avoid the packet loss, paying only the price of a minor slow down.

What is ECN bleaching?

ECN bleaching occurs when a network device at any point between the source and the endpoint clears or “bleaches” the ECN flags. Since you must arrive at your content via a transit provider or peering, it’s important to know if bleaching is occurring and to remove any instances.

With Catchpoint’s Pietrasanta Traceroute, we can send probes with IP-ECN values different from zero to check hop by hop what the IP-ECN value of the probe was when it expired. We may be able to tell you, for instance, that a domain is capable of supporting ECN, but an ISP in between the client and server is bleaching the ECN signal.

Why is ECN important to L4S?

ECN is an essential requirement for L4S since L4S uses an ECN mechanism to provide early warning of congestion at the bottleneck link by marking a Congestion Experienced (CE) codepoint in the IP header of packets. After receipt of the packets, the receiver echoes the congestion information to the sender via acknowledgement (ACK) packets of the transport protocol. The sender can use the congestion feedback provided by the ECN mechanism to reduce its sending rate and avoid delay at the detected bottleneck.

ECN and L4S need to be supported by the client and server but also by every device within the network path. It only takes one instance of bleaching to remove the benefit of ECN since if any network device between the source and endpoint clears the ECN bits, the sender and receiver won’t find out about the impending congestion. Our measurements examine how often ECN bleaching occurs and where in the network it happens.

Why is ECN and L4S in the news all of a sudden?

ECN has been around for a while but with the increase in data and the requirement for high user experience particularly for streaming data, ECN is vital for L4S to succeed, and major investments are being made by large technology companies worldwide.

L4S aims at reducing packet loss - hence latency caused by retransmissions - and at providing as responsive a set of services as possible. In addition to that, we have seen significant momentum from major companies lately - which always helps to push a new protocol to be deployed.

What is the impact of ECN bleaching?

If ECN bleaching is found, this means that any methodology built on top of ECN to detect congestion will not work.

Thus, you are not able to rely on the network to achieve what you want to achieve, i.e., avoid congestion before it occurs – since potential congestion is marked with Congestion Experienced (CE = 3) bit when detected, and bleaching would wipe out that information.

What are the causes behind ECN bleaching?

The causes behind ECN bleaching are multiple and hard to identify, from network equipment bugs to debatable traffic engineering choices and packet manipulations to human error.

For example, bleaching could occur from mistakes such as overwriting the whole ToS field when dealing with DSCP instead of changing only DSCP (remember that DSCP and ECN together compose the ToS field in the IP header).

How can you debug ECN bleaching?

Nowadays, network operators have a good number of tools to debug ECN bleaching from their end (such as those listed here) – including Catchpoint’s Pietrasanta Traceroute. The large-scale measurement campaign presented here is an example of a worldwide campaign to validate ECN readiness. Individual network operators can run similar measurement campaigns across networks that are important to them (for example, customer or peering networks).

What is the testing methodology?

The findings presented here are based on running tests using Catchpoint’s enhanced traceroute, Pietrasanta Traceroute, through the Catchpoint IPM portal to collect data from over 500 nodes located in more than 80 countries all over the world. By running traceroutes on Catchpoint’s global node network, we are able to determine which ISPs, countries and/or specific cities are having issues when passing ECN marked traffic. The results demonstrate the view of ECN bleaching globally from Catchpoint’s unique, partial perspective. To our knowledge, this is one of the first measurement campaigns of its kind.

Beyond the scope of this campaign, Pietrasanta Traceroute can also be used to determine if there is incipient congestion and/or any other kind of alteration and the level of support for more accurate ECN feedback, including if the destination transport layer (either TCP or QUIC) supports more accurate ECN feedback.

The content of this page is Copyright 2024 by Catchpoint. Redistribution of this data must retain the above notice (i.e. Catchpoint copyrighted or similar language), and the following disclaimer.

THE DATA ABOVE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS OR INTELLECTUAL PROPERTY RIGHT OWNERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THIS DATA OR THE USE OR OTHER DEALINGS IN CONNECTION WITH THIS DATA.

We are happy to discuss or explain the results if more information is required. Further details per region can be released upon request.