Speaker 0
00:04 - 01:42
Hello, everyone, and welcome to today's webinar from reactive to proactive observability, a user centric digital strategy for banks. We're excited you could join us.
My name is Megan, and I'll be moderating today's event. Today, we'll be doing a deep dive into how leading banks are using Internet performance monitoring to ensure digital resilience and improve customer experience in today's complex technologically distributed banking environment.
We are thrilled to be joined by two speakers, Gerardo Datta, field CTO at CatchPoint, and Leon Adatto, principal technology advocate at CatchPoint. Before we get started, I have just a few housekeeping items.
Please note the slides will advance automatically throughout the presentation. To enlarge the slides, click the enlarge slides button located in the top right corner of your presentation window.
If you need technical assistance, click on the help widget located on the bottom left corner of your console. We encourage you to submit questions at any time throughout the presentation using the q and a widget at the bottom of your console.
We will try to answer these during the webcast, but if a fuller answer is needed or we run out of time, it will be answered later via email. Please know we do capture all questions.
So to kick off, I'd like to introduce our speakers, Gerardo and Leon. Gerardo is an experienced technologist with over twenty years of experience in digital strategies and web technologies who has been at the center of the web, mobile, social, and cloud revolutions.
Leon is a principal technology advocate and has held multiple industry certifications over his thirty six years in IT, including Cisco, Microsoft, a plus, and more. So Gerardo and Leon, would you both like to quickly introduce yourselves?
Speaker 1
01:42 - 01:48
Hello. To the audience, thank you for the opportunity.
Happy to be here. Thanks for the introduction, Leon.
Speaker 2
01:48 - 02:08
Hello. It is great to be here.
It's fun to talk to Gerardo. Just as a side note, Argo and I have known each other for over ten years now.
We've worked at, two different companies together. So if you detect a note of, like, teasing and snarking, it's not because, you know, he ran over my dog or anything like that.
We just like to tease each other. So we're happy.
Speaker 1
02:08 - 02:08
to be on.
Speaker 2
02:08 - 02:10
we're looking forward to this.
Speaker 0
02:10 - 02:30
I am as well. Thank you so much for joining us today.
So with that, let's get right into it. I'd love to set the background for this topic by discussing the evolution of banking infrastructure.
How has the technological landscape of banking services changed in recent years, and why are traditional application performance monitoring approaches no longer sufficient?
Speaker 1
02:30 - 04:54
Well, I would say that there are, three main things that I have to change. The first one which is evident in in the graphic, is the increased complexity of banking systems.
Right? That complexity is made even worse because banks now have to rely on third party systems, cloud services, third party services like card issuing, but also connectivity to remote employees, remote cloud, authentication systems, third party security tools, etcetera. So that's the first one.
Right? Like, increased complexity by just the sheer number of things that a bank needs to work on that are no longer all of them in their mainframe or under their control. The second change is that everything now is distributed.
So, like, you know, twenty years ago, everything was in probably one building where you have a very large room with a very large mainframe. And and nowadays, all the systems are distributed across different clouds, different data centers, Doctor sites.
Employees are distributed naturally between different offices, but now also working from home. And then users are also increasingly distributed.
Right? You know, twenty years ago, nobody was using mobile application. Very few people trusted, online banking.
Now it's a complete opposite. And that gets me to the third one, which is users are getting more and more impatient.
Right? So many years ago, people would wait seven seconds for a page to load. That would be probably standard.
Nowadays, everybody wants a bigger in in Internet at home, and they expect the page to respond within a second. In three seconds, when a page is not responding, users are starting to get frustrated.
Right? You go to a website and you type in the URL for your own event, maybe it's hit submit and you're logged in credentials. And in three seconds, you start to think something's wrong.
By second five, you're really frustrated. By second seven, you probably close the window.
So we're not really very patient. And the consequence of that is that bad online experience or bad dealer experience, whether it's on a mobile device or online, is now the main cause for people leaving the bank.
Right? So it's it's really important for for IT leaders in the banking industry to be aware of all these changes and how technology has changed and how this leads to just like any other evolution in technology, new requirements, and they need to be aware of what are the blind spots in this new world of of distributed hybrid complex Internet centric IT.
Speaker 2
04:54 - 08:16
And so I I I wanna reflect on one thing, like, when we talk about what banking is like today. Ten years ago, I was at a DevOps days.
It happened to be DevOps days Columbus, and, there was a bank at KeyBank was talking about their decision to move production workloads to the cloud, and people were losing their ever loving minds. Like, you're doing what? They were really, having a hard time wrapping their their mind around it.
And here, ten years later, I don't think anybody would, you know, blink an eye, let alone lose a minute of sleep over that kind of decision. So it just tells you how far we've come and how far a relatively risk averse environment like banking has has been.
I also as we talk about monitoring and observability strategies, I wanna be really clear that, I've been working with monitoring and observability for twenty seven years. I've worked with lots of different tools at lots of different companies.
We are not here. Eric and I are not here to call anybody's baby ugly.
We are not here to say that you're doing it wrong. If it works for you, then it works for you.
Yay. You know, that's that's good.
We don't want tool shame. What we wanna do is is that point out that tools do improve both over time and within particular industries, and then keeping up with new capabilities is sometimes hard if you're stuck in the weeds of stuff.
And so we wanna point out where there are opportunities that you can capitalize on. The last part, you were asking about, you know, why the, you know, APM approach might not work or might not be as effective.
So first, I wanna just quickly go over some terms. You know, so called traditional old monitoring, which is largely based on, you know, ping, SNMP, WMI, is typically built for on premises only or primarily on premises environments.
APM extended that to include newer techniques like synthetic transactions and aggregating logging output, monitoring database specific metrics, you know, like locking and blocking and weights, and even things that were considered part of, you know, observability like tracing. But all of that was still largely focused or contextualized around a single or a small set of environments and a and an unknown level of of users.
We didn't know where the user was, and we didn't particularly care. We were focused on where the the gear was, where the server, where the applications were honed, and how those were doing.
So you you also have an a newer approach called network observability, which really hyperfocuses on the network. So it's looking at CDNs and BGPs and ASNs and a bunch of other TLAs and things like that and and almost ignoring the application observability that we've started to become comfortable with.
And what all of that leaves out to a greater or lesser extent is the user, the external user, and that's where, you know, IPM. You know, this is the thing CatchPoint is calling it, Internet performance monitoring, which is, the APM with some other techniques, but with the user's experience as its primary focus.
So I just wanna contextualize that because a lot of what we're talking about as far as where APM might not be working anymore has a lot to do with that. What next?
Speaker 1
08:16 - 10:34
I think, our host says the computer has frozen. So I think let's why don't we talk a little bit about what are the cost of of those blind spots.
Right? So we we know that there are limitations as technology has evolved, but ATM has stayed very, developer centric. Right? So if you think about application, you know, infrastructure metrics are just talking specifically about infrastructure by nature.
Code tracing is telling you about how the code is. It's very useful for application developers.
And logs are looking at your systems. And then on the other side, SMNP, WMI are very device centric.
So if you if you build a map of your bank's infrastructure similar to what we're showing on the screen right now and look at which ones of those technologies you have a really good handle on, not only their health, but the health of the connectivity to those devices, then it creates a lot of a lot a lot of, blind spots, things that you don't have disability. And the impact of that, we already talked about, you know, trust and user frustration and how that, results in customers that gain back.
Not only individual customers, but also corporate banks. Right? So if you your finance application doesn't work, if you cannot do a payroll one day because your system is down, I mean, we've seen instances where wire transfers are down for banks for days, etcetera.
But then there's also, scripting. I mean, in Europe, there is DORA, the the Digital Resilience Act.
It's only, I think, a matter of time until we see that today. There was a senate hearing in The UK for a bank that's saying, like, how are you charging customers for service when your your your users cannot access your bank's website? Right? And is is they're they're taking it very seriously up there.
It's only at the time where that spreads to multiple banks. And by the way, DOR applies to any US bank doing business in Europe.
And then third, your IT system, whenever there's a there's a massive outage like we've seen on the news, seems like every week, you call our war room, and if you don't you cannot find the problem quickly, then you get all your best resources in in a war room trying to figure out what's broken. And if you don't have the visibility, that could take hours, and and that's the worst situation to be in.
What what what are your thoughts, Leon, about, you know, other impacts and and, consequences of having these blind spots?
Speaker 2
10:34 - 13:59
So so a little bit more background on me. I actually worked for about five years at a large the sixth largest Midwestern bank.
So I'm I'm pretty aware of that. And you mentioned Dora.
America has had a similar or an analogous system for ATMs. The FDIC requires a separate network to run alongside the bank's network on, ATMs so that if the banking network goes down, the the FDIC separate network runs.
But the thing is is that FDIC doesn't have access to the user's account, so they just have a blanket amount of money. And so the user could be taking out way more money than they actually have, and the bank is on the hook for it.
So they really don't want that to happen. So, yeah, the, you know, the network is definitely one of those places that gets a lot of focus.
But as far as the, you know, the blind spots or as far as the key metrics that you should be looking at, obviously, uptime for everything. I think uptime is is a default, whether you're talking about application availability or whatever.
Now how do you get that is the work of the work, you know, conversation. I also think you've got, you know, user workflows.
The blind spot may not be whether the app is available or not, but where are people going in it? Can they find the functions they're looking for? What are the drop offs? You know, when they're in an application, how do they go to a particular place and then just go away inexplicably? And if so, why? Because it may be that they couldn't find the function or they couldn't use the function or didn't make sense or wasn't labeled correctly or any of those things. And your monitoring observability can tell you that if you let it, if you build it.
The other thing is and you hinted at this, but I wanna hit it kinda hard, which is the it's not just that the application is up from my site. It's not that, you know, it works on my laptop.
It's is it available from all locations? Is the app you know, if the application is available from the data center in Cleveland, you know, but all your users in Austin can't get there, it doesn't matter. Or as, one of our one of our leadership says, you know, having a tsunami detector placed in the middle of Kansas doesn't help the people on the coast because there really are usually no tsunamis in Kansas.
So where are you monitoring from matters a lot, and not from singular, but from multiples. Monitoring from many, many, many sites is desperately important these days and something that a lot of tools don't accomplish particularly well.
Okay. We've got, you know, we've got a monitor in every cloud.
Yay. But we've seen cloud the cloud.
We've seen cloud providers go down, and then that doesn't help. Okay.
We have a couple of extra colos. Alright.
And it helps for those colo locations. But we really you almost wanna say that you need to know the status of the environment from every place your user is because that's what matters now.
You know? That's that's really the the big part of it. And and when we talk about a blind spot, I think that's the big one, is the blind spot isn't a key metric.
The blinds unless you consider the user's experience to be the key metric, which I think it is. Harold, I'm unmuted.
Speaker 1
13:59 - 14:02
Yes. Sorry.
There's another aspect of.
Speaker 2
14:02 - 14:05
the It's 2025.
Speaker 1
14:05 - 15:17
Which is, which is how you think about this as as an IT organization. We we think about it as as Internet resilience or digital resilience.
Right? And, as we've been doing this for for a long time, we think about it in terms of four pillars of of resilience. One is reachability.
Can you get to the resource? Like, in a physical bank, can you drive to the bank? Right? Second is availability. Does does all the capabilities of the system work as intended? Right? Because you can get into the bank, but if the ATM is not working or the teller is all the tellers are sick, it's useless.
Then it's performance because, slows the new down as we've talked about. Right? And then the last one is reliability.
Can you trust the system will do this over time always when it counts? Which also includes, you know, being proactive because you wanna make sure that when you open the doors to be banned, all the systems operate, at 09:00 when you open the doors. Right? So if something happens at three in the morning, you wanna catch it at three in the morning, not at 09:05 when people are starting to freak out and there's line of ups at customers, etcetera.
I think, to talk more about this blind spots, it will be useful to show the next slide also with where we talk about the, the Internet stack. You wanted to add something, Neil?
Speaker 2
15:17 - 15:54
I I just wanted to say that that if you can roll a truck to fix something at 02:00 in the morning before before the doors open, that is so much more effective. And it's, again, a piece when we talk about blind spots, it's that's a blind spot.
Not, you know, well, it's trip on the morning, nobody's there, nobody's on. A, the world is global and your application could be used at any time, but, b, even if that's absolutely true, so what? You know, wouldn't you rather know when you have time to get people moving and fix it before the, quote, unquote, real users are on? That that's all.
I was just emphasizing that point.
Speaker 1
15:54 - 17:09
That's great. Yeah.
And as we're thinking about the the the blind spot, the the I think a visual that could be useful is what we call the Internet stack. Right? So we've had application stacks, and we have, several layers of OSI for networking, and we have, multi tier architectures for web applications for a long time.
I think it's time for us to start thinking also about the Internet stack, which is a collection of all the technologies, protocols, technologies, and services that the previous slide depends on. Right? And so that means that your monitoring system needs to be not only aware of all these technologies, but needs to have specific tools to be able to monitor this thing.
So if you have an Internet of of things deployment, you need to have NQTT protocol support. If you're, as a bank of any size, you probably need to start monitoring the BGP.
Otherwise, you can be in a situation like like what Meta faced three years ago when all the services went down for an entire day. Right? If you if you care about, you know, your your ATMs, you need to monitor the, you know, MPLS network or whatever product network you have connected to them and so on.
So this kind of highlights the new set of technologies that the IT organizations need to be thinking about to avoid those having those blank spots.
Speaker 2
17:09 - 18:47
Yeah. Again, that customer centric, customer focused approach, Charity Majors, who is a a friend of mine, a CEO of another observability company, very famously said nines don't matter.
She actually said something else, but nines don't matter if the customer isn't happy. You know, this five nines approach is first of all, five nines from where, and second of all, so what? You know, if the customer isn't happy with the service or with the responsiveness or whatever, and and like I said, it's it's the user's experience is paramount, to the point where I don't think any other alert matters.
I don't care if a disk is full. I don't care if a, you know, circuit went down.
If the user's experience is okay, then it's okay. I'm not saying you don't fix it.
But don't you know, you don't have to fix it at 02:00 in the morning. On the other hand, I don't care if everything is up in green.
If the user's experience is bad, you need to fix it. Whatever the it is in that sense.
So, I think that combine you know, I like this slide because it first of all, it helps identify and categorize where you're looking. We were talking about blind spots, so now you can look at the stack and say, well, you know, I have good visibility at this layer, but not that layer.
And it almost takes the technology out of it or demystices the technology. The protocols from the applications from the, you know, cloud services.
And it says, look, you know, these are this is a an area to look, whatever the technique is. So I really like that.
It's great.
Speaker 0
18:47 - 19:04
Yeah. That's really interesting.
So IPM then provides this outside in perspective of the entire delivery chain. Harada, you had earlier mentioned proactive monitoring as opposed to reactive as another key aspect of IPM.
Why is being proactive rather than reactive so important for banks?
Speaker 1
19:04 - 21:52
Well, so it's it's important for for two reasons. Right? So the first one is is that, you wanna know about the incident before it impacts the business.
So to to to too often in in some of the systems like, when when you think about customer experience in those banks, it's owned by somebody in marketing that is doing NPS score surveys, which you get maybe a couple days after you have some interaction with the bank. Right? So I go I I I go to to the bank in person and try to deposit a check, and you then you go probably, I don't know, service your car, grab some coffee, etcetera.
The next day, you get a survey, and you pretty much forgot most of the experience unless the bad parts. And you complain, yeah.
You know, the teller was, very busy. And it it seems like every time you go into customer service, there's like, I'm sorry.
My computer is very slow. Right, which I don't think it's acceptable nowadays anymore.
And so what are you gonna do with the NPS score? Right? It's is the review looking its way into the past? Wouldn't it be better if you're testing your transactions, you're simulating a teller in your bank for your mail location? You can do it from every region, from every city, or from every bank if you want, and monitor the performance of each terminal to make you do it continuously, twenty four hours a day, maybe every five minutes, maybe ten minutes. So if there's a problem, whether it's a DNS resolution or it's the application or it's a back end code, whatever it is, you are aware of it and you have a chance to fix it so so that when the first customer shows up at nine in the morning, you have the confidence, you have the certainty that they're gonna have a good experience.
Right? That that is the opportunity that you have when being proactive. And the other part of being proactive is that if you have full visibility into your entire Internet stack from the end user all the way to the back and the systems and your code and your databases and your infrastructure, then instead of calling a war war room and getting everybody going like crazy, waking up people, pages going off, and just typically a triage.
We're like, hey. How is the network doing? How is our server doing? How is the database doing? The the system, if it's intelligent enough, can pinpoint, look.
There's a problem, and the the root cause is here, and this is likely the solution. So you have the opportunity to fix it even before your customer show up at the door or before at least at least, before it becomes a very big problem.
Right? So that that is the opportunity of being proactive and the value of doing that. And you did it by combining not only real time view, of your entire system, including all your dependency maps into the Internet stack, but also a combination of synthetic monitoring, which is by the efficient proactive, ROM monitoring, and other tools that give you a full picture.
Liam?
Speaker 2
21:52 - 25:05
Yeah. I I wanna I wanna elaborate a little bit more.
There is the reactive you know, when you talk about this and the customer being the the central focus, there is that reactive from an IT practitioner standpoint, reactive for us is when the server fails paying, now we know to go look at it. Right? Like, you know, when the thing is down, and that's reactive as opposed to proactive, which is a a different pattern that I'll explain in a minute.
We in banking, in any sort of service industry, the customer called and said the experience was horrible, which is a lot less reliable than paying. Most customers just don't call.
They just leave. You know, they just go someplace else.
So that's a whole other thing. But there's also reactive of you know, there's a reactive of waiting for it to blow up, versus proactive.
For example, a a pattern that we see a lot, and it's simplistic. I admit it's simplistic, but it's also easy to explain and understandable.
You're looking at a system and you see that there's a fan failure. Just, you know, a fan failed.
That's it. No big deal.
Right? And then a little bit later, you see the temperature inside the case go up. And, okay, it's getting a little warm.
Fine. Whatever.
And then all of a sudden, you see a big failure. And then immediately after that, you get, you know, a few minutes later, you get what's called the cold start trap, which is where the system says, hey.
I started back up again. I don't know why I shut down in the first place.
Right? You see that pattern. The if the thing that you're seeing, if the thing that you're getting is the pain failure, that's your first or worse, the cold start is your first notification something happened, obviously, your observability systems are not particularly sophisticated.
I know I said I wasn't going to, like, tool shame, but here I am. Right? On the other hand, you had an opportunity to know about that at the fan failure.
And if the fan failure wasn't, you know, a warning enough, then you've got the the temperature wise. Now what I mean by that in terms of banking and service is that when you're looking at, you know, application slowdowns, you know, looking at the system even without real users on it, and, obviously, real users is the gold standard.
But even without it, looking at it at at times when usage is low and saying, well, what would the response have been? What would this path through my application look like, you know, right now, this hour, this whatever. And then also looking at page abandon.
You know, we are, again, we are customers going in my application and just stopping and disappearing. And let's let's look at that.
You know, you can investigate that before the three day later bad NPS score. You know, and that's, I think, a mindset.
And even if this proactive versus reactive concept is aspirational in your organization rather than something you can implement, you know, lockstep every step of the way, you still have a chance to reduce the frequency and the duration and the impact of your outages. So, you know, you have the opportunity to reduce your mean time to detect, identify, escalate, and resolve, and then even validate.
That's the other thing. Like, oh, we fixed it, and then you find out half an hour later, no, we didn't.
Like, validating that the thing you did actually fixed that problem, really cool, really important. So, you know, that that's another way to look at it.
Speaker 1
25:05 - 26:25
When you you're talking about those those metrics and how to be proactive, I I I think about, you know, a a good analogy, I think, is that if you if you're an engineer and you're working on on engines your your whole life on combustion engines and you're aboard a ship, you're gonna be in the engine room. Right? Because you're gonna be looking at your metrics, which are what is a mixture of gas wind to air ratio, what is the RTM, how much gas wind we have, etcetera.
And and so it's natural to continue looking at the tools that you've used your entire life, and and that's very reactive. Right? On on in gas, you can do a little bit of, forecasting, etcetera.
But but you're looking at the engine itself. Now if you're proactive and you're starting to think about the broader context, then you're looking at sonar.
You're looking at your route. You're looking at where the ship is going.
If you're talk talking about the Titanic, well, that's a great example of why being proactive is very important. Right? You wanna know about the the the challenges ahead of time so you can plan ahead and steer the ship in the right direction.
It's not like you don't need the the engine room. Of course, you need that.
So we're not advocating for don't use ATM. Absolutely necessary.
The majority of the Volvo banks are usually a combination of ATM and IPM, But you need the combination of both, precisely. That that's the main message.
Speaker 0
26:25 - 26:28
Very well said. And and taking.
Speaker 2
26:28 - 27:18
sorry. I just one more because, you know, the boat the boat analogy.
You know? Okay. Ignoring icebergs and massive failures, just being able to tell the engine room, we've got some choppy water up ahead.
Can you give me, you know, 25 more knots of speed? I'm actually gonna, you know, go out a little bit and try to veer around it. So the the engine still matters, and the fuel oil mix and all of that still matters.
But you also are looking at it, like you said, from a higher level, from a holistic level and saying, this is why I need this much engine power to move around this area so that my users, the the guests on the boat, on the ship, are having a better experience. So they're not all leaning over the edge losing their lunch, you know, so I can move around the choppy water and continue to give a smooth experience.
There we go. Okay.
We have now beaten the analogy into the ground or into the water. Go ahead.
Speaker 0
27:18 - 27:31
It's an incredible analogy, though. I'm glad I'm glad that you added to that as well.
So you both made some very compelling points about the value of IPM. So then how should banks measure the success of their IPM implementation? What KPI should be should they be tracking?
Speaker 1
27:31 - 30:54
I think it goes back to this mindset. Right? So when when you have an infrastructure or a systems focused mindset, your KPIs seem to be very infrastructure centric.
Right? So your CPU utilization, your code efficiency, what is your SLA for your network and stuff like that. I think the opportunity with IBM is to put the users at the center of of your world.
Right? So if you if you think about, you know, bank, imagine if you were I don't know. I'm I'm in Austin.
So imagine Frost Bank here, all all the branches in Austin connected to certain ISP are having incidents, and and nobody can deposit checks or get money or anything like that. Right? People are screaming about it, upset.
They go all the way to the CEO. Your answer cannot be yes, but our service on the cloud are running at 50% capacity, or we save 20% in our databases.
Like, that's completely illogical. Right? Because the goal of IT is not to man to keep a server healthy.
The goal of IT is to serve the business, to serve the end user. And by the way, end user could be a customer, could be, you know, by proxy, the detailer at the at the at the bank, could be a trader that needs to, you know, do make some investments, could be an API that's allowing somebody to deposit money via Venmo or Zelle or anything like that, or or could be any other employee.
You know, it could be an HR employee of the bank. Those are also important, but all of those are systems.
So so the opportunity is and not only focus not not on systems, but on the users and and users where they are located. Right? Because, again, if if, you know, a lot of monitoring today, again, using the analogy, like, you you monitor from where you're comfortable, you monitor from the cloud.
Guess what? Your customers and your banks are not in the cloud. It's it's logical that your your banks don't have the same connectivity or capacity or data center resilience systems that that an Amazon Web Services, data center, right, or or even at any tier one co location data center.
So you need to understand what is the actual real world connectivity of somebody in the mobile device trying to connect to through Xelec or a mobile application via browser or a native application, or or what about the bank around the the corners here, or everybody in your office in in Wall Street was trying to make, trading. Right? So that that is really important to put the end user center and to focus on the location and and then focus on what we call, instead of SLAs, you see the words here in screen, XLO.
And XLO is an experience level objective. Right? So instead of saying, we're gonna make sure that our network uptime is 99% or four nines, you say, well, we wanna make sure that anybody visiting the website can complete transaction in three seconds.
Right? And, obviously, that to meet that XLO, you require all the systems behind it to to work. But, also, meeting that XLO and alignment of XLO is very powerful to align the IT organization and and the operations teams with the business.
Right? So if you agree on on the value of IT, do you want your users to wait, or or you want to make an investment so you can actually have make it at even a competitive advantage? The logistics are more reliable, more trusted, more resilient than those of the competition.
Speaker 2
30:54 - 33:31
And so I wanna add to this idea of an XLO, which is that, obviously, you know, certain users will not be able to complete the transaction in three seconds less because they can't because they can't click that fast. They don't know that that they're really bad at math.
That would be me. Like, whatever it is.
So that emphasizes the need for autonomous systems for synthetic transactions that verify it's basically, can you do a speed run through your system as fast as possible? And is that and if that's possible, your user will have no complaints. Right? Just I just wanted to clarify how such a thing is possible.
Back to the question, Megan, the, like, what should you be tracking? I think there's two distinct phases. One is in the beginning when you're implementing an Internet performance monitoring system or you're implementing new observability and then the ongoing.
So in the beginning, I think really the metric you should look at is, you know, how many systems have you added to your IPM solution, how many specific discrete tests or monitors, alerts, dashboards have you created? Now that, you know, you measure you get what you measure. So if you're measuring how many reports you created, you're gonna have 9,227 reports.
That's an oddly specific number that might stick out in my head for a particular reason because of a past mistake I made. But, you know, like, that's not the point.
But the when you're looking at adoption, then you have to know, is it is it being adopted, or do you get, like, three tests and then nothing? So it's really more about the delta than the number. You know, are we continually, at the beginning stages, adding more monitors, alerts, dashboards? Are they being used? Are they being visited? Are they triggering? You know, those kinds of things.
And immediately, the question that when I'm talking to folks they have is, well, where am I gonna find out that? Like, I only know what I know. I don't know what I don't know.
Going to your service desk, going to your customer representatives and saying, what are the biggest questions? What are the biggest issues that people call with? That's the thing that you wanna take a look at trying to fix and seeing if you can resolve. Ongoing, you actually don't want more and more and more dashboards and alerts because it becomes unmanageable.
Then at that point, what you wanna focus on is how are your, you know, mean time to detect, repair, respond, how are those changing? How are those going down? You know, that's gonna be your key metric in the middle to ongoing, you know, maintenance phases of your IPM journey, and that'll be the indication of success.
Speaker 0
33:31 - 33:40
Excellent. Thank you.
And what about the overall ROI of implementing IPM? How can banks measure the business impact and justify the investment?
Speaker 1
33:40 - 38:07
Well, so so that's an interesting question. There's many ways to answer that.
I think the first one is, it's an insurance card. Right? You don't wanna wait until you're in the news.
Like, I was talking about Mira, and they basically didn't think that monitoring BGP would be an important thing for them. But BGP is like their ZIP code.
Right? So when BGP was failing, they somebody pulled an ASN and and basically, as it propagated, every system, including the systems used to authenticate IT people walking into the data centers, was failing because it could not communicate to any authenticating system. Right? So you don't wanna wait until something like that happens for you to say, okay.
Now we need to pay attention to this. Right? Because by the time that happens, the impact is usually measured in millions, careers, lost, people fired, etcetera.
So that that is that is kind of obvious, but it's but it's scary. And and it's this is I'm not trying to scare anybody into into doing things, but so so real words.
I think one one angle that is more interesting is is also the microallergies. Right? So what we hear from customers all the time, like, we have a customer saying, okay.
We I I wanna monitor not only our websites, but our CDNs and our our overall web infrastructure. And when they deploy the their IPM system, they learned that there was a particular region where there was a routing, challenge, and all their customers in that region were having terrible experiences.
Right? And what happens with those microalgias is that, you know, like like Leon said, a few customers complain and, you know, IT goes, like, oh, it looks all looks great to me. Must be user error.
It's probably their computer or, you know, dumb user does not know how to configure the router or whatever. You usually blame it on the user.
And it happens four or five times, and and it tends to get lost in the sea of everything that's happening in IT. Right? But normally those five users are very upset.
There's probably another 500 users that are frustrated and they're leaving. Right? So but when the system they installed the system to monitor their CDN, they they were aware of this, and they fixed that problem.
And it took them five minutes when they could actually once you can see the problem, you can fix it easier. Right? So they fix the problem, and and suddenly their satisfaction improved in that region.
Those calls stopped coming in, and and, you know, it was a small region, but it could be multiple of those microoutages happening all over the the place. Right? And it's frustrating not only for user, but also for employees, not only the the tech support, the IT, but the bank employees as well.
And and the other part of the ROI is is the efficiency of those war rooms. Right? Calling the war room where you bring all your most talented people and interrupt their day and wake them up, etcetera, is is very, I would say, talent consuming versus somebody can see at again, three in the morning, it's usually when things tend to go wrong.
Look. Our ISP in this location went went bad.
We just switched to to our alternative, done problem solved, a a virtually, tactical network engineer can make the change, and you don't have to wake up everybody. Right? Hey.
Our database is having, some issues. You can, I don't know, restart the database or fix the queries or, you know, add more RAM to the server, etcetera? At at the end of the day, once you start using this, you realize it's almost like you've been driving without a dashboard.
Right? So so I I don't know how many how many people, what would happen if if Google Maps start working for you when you're driving. Right? People cannot get to work to work or to to the coffee shops.
Right? Because it would become dependent because you could appreciate now the value of those systems. It's it's the same here.
Once you start seeing everything you can see and everything you can optimize, it's not only about, you know, avoiding incidents. You can also optimize costs.
You can identify challenges that you might have before, then you realize what you've been missing. And around optimizing cost, like, imagine you have you know, you're using a new zTNA or a new, SASE, or you wanna change your your SASE vendor.
Well, with with IPM, you can actually do an actual assessment of what SaaS vendor is delivering the best experience for your real world users in their locations with the actual application they're using. Not theoretical, not what the vendor tells you, but, actually, you can make the assessment and make smarter decisions about those technologies and then hold them accountable to those things.
Leon, what do you what do you have to add?
Speaker 2
38:07 - 42:38
Okay. So as far as ROIs, first of all, I'm going to just I'm gonna I'm gonna call call it like it is.
Insurance policies are are distinctly and and enormously unexcited. Like, they're necessary.
I am not saying don't have insurance, but I'm saying, like, as far as a compelling reason to do something, it's a good insurance policy, don't care. Like, of course, I also was doing some home repairs and fell off a ladder yesterday.
So, you know, I mean, you you know who what kind of personality I am. Some some ROI some some tools or some numbers that you can start to collect that will help you go back to your organization and explain, not justify, but explain the value that, your monitor observability systems have.
The first one is the value of every alert. I find that lots of people don't build the value concept into their alerting, and and it's a relatively easy spreadsheet to build.
You take any you take your list of alerts from any system that you have them, and first you list what it would cost for you to deal with that problem if you didn't have monitoring and alerting in place. You know? So how long would it take to detect? How much business would you have lost? How much does it you know, how many people have to get involved to fix it? How long does it take to fix it before it's all back together? And immediately, when I'm talking to folks about this and they say, well, how am I supposed to know that? So if you don't have a good idea, there's a magical way to get that number.
Give a wrong number first. People will correct you.
Trust me. They will fall over themselves.
If you say, oh, yeah. Every time this alert hits, we lose $6,000,000.
And they're like, what? We lose $6,000,000 and immediately the owner of the application of the system is like, that's not true. We always have the blah blah blah blah.
And now you get your numbers. It's great.
It's being wrong and weaponizing being wrong. So you have the number of what it costs to fix the problem without monitoring, learning, observability.
Then you figure out what the cost is. Again, how much time to detect and all that stuff with your IPM system in place.
And then the only known argument you need is how many times has that alert triggered because now you just do the difference. Right? The cost without minus the cost that is, so that gives you your difference.
How many times you'd alert? I have gone back and justified an entire observability system just by saying this alert triggered 15 times this month for a total, you know, total gain. We didn't lose.
We didn't have to spend $45,000. That was this month just by having this information in place.
But that's a piece that people don't always calculate in there or aren't able to express or articulate, particularly well to the business leaders. The other piece that I wanna point out that people don't track is the value in terms of the staff not being distracted or not ignoring your system.
I was working with a particular sista particular group that was getting 80 alerts a week on their systems, most of which were unnecessary or meaningless or whatever. And we worked together to really focus on what are the things that really matter to you, and we boiled it down to eight.
And we saw two things. First of all, you know, they weren't dealing with AD alerts.
Fabulous. But what we also saw is that the alerts they were getting were dealt with so much faster because they knew, The entire team knew that if that pager or okay.
I just dated myself. If they're you know, if that ticket went into their queue, it was real, it was urgent, and they had to deal with it immediately.
And that changed the entire nature of what they were getting. So those are two things that you can easily track, to determine the ROI of a given system or a given effort.
A little bit harder to mention. We've been dancing around it.
Customer satisfaction and retention, obviously, I think banks do track that. And if you notice the churn is going down, you can blame observability because why not? And then the other thing is I I like to say that CatchPoint helps make your weekend longer because we don't actually add real hours, but what we do is we we make it so that you know that if you're not being interrupted, there's nothing to worry about.
Don't go check your phone. And if you are being interrupted, you know it's real and you know you have to deal with it, but until then, you don't.
And that also is a huge difference in, you know, how your staff responds to things and how your staff know feels confident when they're not responding to things.
Speaker 0
42:38 - 42:47
Excellent. Thank you.
So for server banks that are convinced of the value of IPM, what are the immediate steps they should take to get started and improve their digital resilience?
Speaker 2
42:47 - 42:59
I'm gonna interrupt you, and I'm just gonna say by catchpoint now, by all of it. That's it.
Okay. Fine.
Fine. I did there.
I did it. Okay.
Gerardo, what's the real answer?
Speaker 1
42:59 - 46:04
Well, I I would take maybe a more prescriptive approach to say, I I think most banks have, the maturity to have tier their applications. Right? Tier one, tier two, tier three, etcetera.
So I I would encourage you to start with tier one applications. Define what are those experience level objectives that you want to deliver to the customer.
Start measuring those and see how you're doing. And then build that map.
We have a tool called StackMap that creates a a dependency map of everything that is impacting an application from front end servers, third party APIs, connectivity, network, back end systems, application code, databases. You wanna be able to see everything so you can see what is impacting that initial experience.
But beyond that, I think it's an important step in implementation is is to change the mindset. Right? It's a culture around customer experience.
For years, we've had Knox. Right? So I was at when I was working at Wackspace, we had a beautiful network operational center.
Then we built, SOX. Right? So security operation center, Still very, network or very infrastructure centric.
I think it's time for, leading organizations to start thinking about a dock. If you agree that every interaction and every process in the bank is either inherently digital or supported by digital processes, basically, the digital experiences are the foundation of your business, the the lifeblood of a of a business, of a bank, then I think it it behooves us to start thinking about a digital operation center.
That instead of looking at the systems, you're actually looking at how does that digital workflow is working from the user perspective through the infrastructure to deliver the outcome that you expect. And and that has the potential of also changing the way that IT systems are see themselves and how the business system sells instead of being, like, the the the people you complain when somebody is is not working to become an operational intelligence team.
Right? Because when you when you pay attention to those digital interactions, then you become a business partner. You're adding value behind just not not just making sure that the service work, but you're actually observing and have intelligence about the digital interactions.
Right? We see that a lot of the technology, financial services, but even, you know, retail, like, at at IKEA, it's a beautiful example because the IT system is monitoring the time it takes for somebody to not only to order something, but but they get to the store. What how is the process, the overall operation When somebody's saying I'm here, goes to the warehouse, the warehouse person will request the electric the the the furniture you would need to assemble, brings it to the front, the customer signs, put us in the car.
That entire process, you're picking up furniture. It's a digital process foundational.
And at IKEA, it's not the IT team. It's the operational intelligence team that's monitoring that and giving them that intelligence.
Right? So so whether that is, for for a bank, there's many other examples that you can translate that same example and and think about IT operations team, in a different way with the with the experience and digital workflow at the center. Leon, what do you think?
Speaker 2
46:04 - 49:16
So I think overall, you know, take it slow. Don't try to, you know, do everything all at once.
Also recognize that you can't buy a box of IPM, you know, and just sprinkle it all over everything. Right? It is it is a process.
It's a discipline. It's a tech you know, it's a set of techniques and technologies, and and that's why you should take it slow.
As I said before, talk to your support organizations to find out where the most the biggest bang, the biggest opportunities for improvement are, and then use those tools. Use your IPM tools to build the map.
So you you monitor the application. Maybe you just start with the website and you see that it's slow, but everything within the website, all the code and all of the web pages are fine.
Oh, wait. Maybe it's database behind it.
Let's start to monitor that. And, oh, wait.
This is making seven API calls, and I'm monitoring the page where the calls happen, but I'm not monitoring the calls themselves. Let me add that on there.
Oh, I'm not monitoring the route, the actual ASN, you know, BGP environment within that. You know, maybe I should add that to it.
So you can build that map as you realize there's pieces you're not seeing, and, that also helps because it's it's an educational process. You don't know how important an aspect is until you have identified all the other parts and see that that one, you know, blind spot or two blind spots need to be covered.
Also, assess your existing tool set. There's a large desire for brick in, brick out.
We already have too many tools. We can't afford another one.
Assess what you have. Please separate monitoring and observability tools from management tools.
The tools your teams use to manage their systems are critical and important and might also provide monitoring or observability like insight, but it's gonna be for that environment. You know, let's say that you have for your Linux environment and you're deploying your Linux systems with it.
That's great, And you want that. It's like, well, why do we need an observability tool? Because you also have Windows systems.
And you probably also have some Unix systems, and you probably have a VAX or two that are hanging around. And that Linux deployment and monitoring system isn't gonna help you with that.
So the tool you use to manage is important. Don't try to take those away.
Also, don't use it as justification for not getting an observability tool that will span the entire area of of, again, those layers of the Internet, you know, the stack the Internet stack regardless of the technology. So that's I think those are the things.
And finally, last one, honest, is make sure your tools play nicely together. Really make sure that, you know, there is, I know that Gerardo is probably gonna look at daggers at me right now.
There there is no one tool that will do everything for you. No.
Not even touchpoint. There's it's that it's gonna you're gonna have heterogeneous environments.
Make sure your tools play nicely together. Make sure that you can get data out of one and into a central repository for visualization or whatever that you're not, you know, you're not trying to make one chew on one tool to do it all because it's not gonna work, and you're gonna be angry.
Speaker 0
49:16 - 49:27
Wise words. Thank you so much.
We are getting close to the end of our time together. But as we wrap up this main section, what final thoughts would you like to share with our audience about the future of digital banking resilience?
Speaker 1
49:27 - 51:44
Well, so I have three things to say. One is that we need to think about this as as a natural evolution.
Right? So, think about thirty years ago, applications, things were very, very different. Right? Like, data centers, banks themselves, networks, etcetera.
Nobody was monitoring Wi Fi back then, and nobody was monitoring the Internet because most people didn't use it. All those things are changing.
So definitely, you need neutrals or or new mindsets. Secondly, it's it's about a journey.
Right? So it's, it's it's it's not about buying a tool and you're done. No.
It's changing your mindset. There's a maturity model in how to do this.
There's all probably, like, upgrades you need to do, some things you might be using. SiteScope, It's been on for, I don't know, thirty years more.
And and then last, I think all of this, like you were saying, is is not about buying one tool. I agree with you, Leon.
There there's you know, we've been talking about single pane of glass for for a long, long time in monitoring, and it it never becomes a reality no matter how much you solve and you combine tools, then, you know, more things to to add become, apparent. And that's why open telemetry has been been so prevalent nowadays to make it easier to have flexibility and to create multiple tools.
But but the key here is is that if you think about monitoring, you're trying to do three things. One, know that the problem exists, and that's the first step in monitoring.
Right? So when you know faster you know that there is a problem, you don't have to wait until somebody at the branch contacts you or customers complain or, you know, if you make the news, the faster you make the problem, the better it is. The second is finding the root cause, and that's where, you know, when you have full visibility, not only do you know the problem exists because you've been testing that proactively, but you know where the root cause is and you know how who needs to go fix it.
And then the third part is fixing, which fixing it becomes easy once you have one and two done. Right? So changing the problem from knowing that you have a problem to just getting to fix it, it it's it dramatically reduces the mean time to to, resolution, mean time to innocence, and mean time to validation in banks and IT systems.
Leon, what are what are your thoughts? Last thoughts?
Speaker 2
51:44 - 53:45
I think I think my final thoughts on this are really just driving it back to the the customer experience. The user experience is everything.
And, you know, back in the day, and I'm not going back that far, we couldn't know what the user experience is. We had to infer the user experience from a variety of much lower level metrics.
And we wished we knew what the user was experiencing, but we just couldn't. Tools have advanced, and I don't think anyone's denying that or ignoring that.
The tools advanced, but the mindset of of what we the way that we frame it continues to be component centric that we still even though we're able to understand the traces or the real user model or whatever, our dashboards are learning tend to focus on metrics and data to a point that ignore the user's experience of those systems. I think that if we realign our mindset and realign our visualizations to what is happening right now in the application right now with those things, and we make that our priority.
Again, as I said much earlier in the in in our conversation, obviously, if a circuit goes down, if a disc, you know, fills or breaks or whatever, you need to fix it. But you don't need to get somebody out of bed at 02:00 in the morning to do that unless it is impacting the user's experience of the system at that moment.
That's when those failures become relevant. So just continue to ask yourself with when you're talk when you're considering observability systems, whether it's expanding the ones you have or adding more or switching to something else, that the question that is central in your mind is, how is this gonna help me deliver a better experience to the people using the system, and how, you know, how can I build that part better or more? That's that's the one thing I wanna just drive home, because all the rest of it is not as important anymore because it doesn't need to be.
Speaker 0
53:45 - 53:56
Excellent. Thank you.
We've got a little bit of time left, so probably we can get through a couple of questions here. We'll start with this one.
What are some examples of leading banks that have implemented IPM successfully?
Speaker 1
53:56 - 56:05
I I want two examples. One is, I don't know if I should name the name, but, you know, I visited them in North Carolina.
They have a green logo, and they they are going to or pretty much at the end of consultation for multiple tools to a combination of ATM and ITM. So every system in the bank, every website, every application, internal, external, etcetera, they monitor the code and the infrastructure with a a single APN tool, and they monitor the user experience, synthetic monitoring, and the Internet with with CatchPoint.
So that consolidation has reduced dramatically not only the complexity and and, the the ability for the IT organization to focus on higher volume, higher value activities, but also the number of incidents they have and the number of of dramatic incidents that they have been, needed to respond or even to respond to management about why why did this happen and how can we prevent in the future. The second example is, the three largest banks in Latin America.
They're all being monitored with IPM at the end user perspective. That means that the tellers at the banks, the, people in the offices, every employee think about digital workforce productivity.
Every employee, they're monitoring from a a small agent in their browser and in their computer. They're monitoring making sure that the users are being able to be proactive.
So before a user even has to complain or anything like that, they know they for sure, they have a certainty that the connectivity in an office, the connectivity to the each individual person, what is the health of the router of the person working from home, or what is the health of the the, ISP that is providing connectivity to the bank around the street. Like, they have all the disability.
They can react quickly. They can solve problems much more efficiently.
Right? So that's made a significant advantage for the bank in a world of such hybrid distributed Internet centric, systems that are we're working on. We have no idea what No.
Speaker 2
56:05 - 56:09
I I I actually don't, and and that was perfect. I'm I'm not gonna try to talk about that.
Speaker 0
56:09 - 56:18
Well, then in that case, we do have time to squeeze in one more question. How does CatchPoint solution integrate with existing APM tools that banks may already have in place?
Speaker 2
56:18 - 57:33
So I'll start with that one, and Hara can can back me up or or tell the audience that I'm completely wrong. One or the other.
First of all, Cashpoint specifically, has, you know, bidirectional API interface. So you can I mean, you can use CashPoint's portal to set up your tests and and alerts and things like that, but you also can use an API API endpoints to pull data out and use it in other visualizations? And I think that's the most important thing.
Again, that goes back to the place roles with others concept. That's that's one thing.
The other thing is that, like, as we sort of emphasized, ATM is focused at a, I'll say, different, not lower in terms of less important or less, you know, critical, but a different level of stuff so that, your EPM systems are focused over here and your IPM systems overlap but don't completely necessarily replace depending on what it is that you're using EPM for. So in one sense, you can use both and, and, again, the IPM catch point is focused on the user experience and all of those things.
And then if you need to drill in, you go to a secondary tool to look at some of the other stuff. Okay, Harv.
Tell me if I, you know, completely messed up.
Speaker 1
57:33 - 58:47
No. That's great.
I I think, the the best organizations we see are using the combination of APM and IPM, whether they have both systems. And, again, you're never gonna have a single pane of truth that is searched for everything.
Typically, you only have a system that alerts you, how things are working and how user experiencing. And when there's a problem, then the actual tech who's actually fixing the code or database, they might go into their own, you know, dashboard that they might need.
Other companies are basically feeding that to, Uber monitoring systems or, alerting systems, Grafana or whatever the case might be. So, yeah, that's I think that combination of APM plus, IPM is is super powerful, something we see across the board, not only financial services, but software companies, banks, retailers, etcetera are that that's where we see the the most successful teams.
SAP in Germany, the largest, public traded technology company in Europe, is is using Cashpoint and one of those APMs to reduce their incidents from, hundred hours a month to basically zero. Right? And that's been a tremendous, success for the organization that is doing this.
Speaker 0
58:47 - 59:11
Excellent. Thank you so much.
Well, before we close, we do have just a very quick poll for you, and that is what additional information would you like about CatchPoint? We'll give you just a moment to respond to this. There's a few things there.
Access to on demand demos or product tour, personal demo with a solution expert, and more banking related resources, reports, ebooks, white papers, or briefs.
Speaker 1
59:11 - 59:15
Brief.
Speaker 0
59:15 - 59:31
Excellent. This has been a really interesting and insightful conversation.
I'd like to thank our speakers, Gerardo and Leon, for a great discussion today. We appreciate you sharing your thoughts and expertise with us.
And, of course, I'd also like to thank our audience for joining us today. We hope to see you again soon.
Have a great day.
Speaker 1
59:31 - 59:33
Thank you. Have a great day.
Appreciate the time.