I’ve been an extremely good IT Operations Manager this year (which is saying something considering the state of the world at the moment) and I have a few items on my wish list:
No More Outages!
Ok, I know that’s a tall order since outages happen all the time for a myriad of reasons. Everyone is now using the Internet as their enterprise network and with the ever-increasing complexity of interconnected cloud services, CDNs, SaaS applications and systems like BGP and DNS it’s a minor miracle any traffic gets to its destination. But could I maybe have fewer outages? And ones that have really minor impacts on end user? Or better yet, get caught before they impact end users?
I know the millions of other ITOps/NetOps personnel and SREs around the world have been working really hard to keep things working (so please give them some nifty presents too!), but serious outages keep happening and they keep getting more and more expensive. I’m not sure who your Internet provider is at the North Pole, but as a Canadian I’m sure you noticed when Rogers Communications went down for more than three days this summer. Maybe give some of their network architects some coal in their stockings if you get a chance. And while I’m thinking of it…
A Big Reduction in Help Desk Tickets!
Santa, I know I said I’ve been good, but I’ll tell you honestly there are tickets that I’m never going to get around to closing. The easy ones like password resets and software updates come in like a blizzard and it takes all day to dig out from under them. I could really use a way to get my users (the clued ones and the ID10T ones) to self-service most of their problems. That’d reduce my ticket count by at least half. The rest are the intermittent errors like connectivity or routing that have me pulling out what’s left of my hair! If I had one simple way of tracking a user’s journey from their laptop to the application and back, then I could close these within hours. Instead, my team is struggling to pin down issues using almost a dozen different tools that don’t talk to each other. Worse, different people in my organization have ownership over half the impacted systems and herding cats would be easier than getting IT professionals to agree on anything. If I get dragged into one more multi-day war room full of executives looking to blame my team for something that’s not our fault, I’m going to have to really start hitting the spiked eggnog. And speaking of which…
More Comprehensive RUM & Web Experience Analytics!
Real User Monitoring is my most powerful tool for assessing the quality of my network and understanding why users can’t reach my web site. It would be great to have more of it in the same tool as my website performance and experience monitoring. Ideally, I’d be able to use RUM to alert me to web issues before the actual users get impacted – that’s the Holy Grail of IT (if you don’t mind me mixing holiday metaphors). I’d love to have a dashboard with a single ‘web site experience’ gauge that lets me know that performance is degrading before we start losing visitors (and money). In fact, now that I think about measuring things….
The Ghosts of Granular Data Past and Present!
If you happen to know any of the Dickensian Christmas ghosts, please send them my way (well, Past and Present anyway – I understand that Future isn’t that communicative). I always need more detailed and granular data about the performance of my network, my applications, my web sites and the experience of my users (internal and external). The devil is literally in the details and it’s often really hard to diagnose network issues with datapoints that are coming in with long intervals between them. As well, I can see what just happened and I can often see what happened last week, but that’s usually about it. I’d love to be able to pull up reference data from last year or even two years ago to put my current measurements in context. Not only would that make it easier for me to find and fix problems fast, it would also let me show off to my bosses about how much our Internet performance has improved over the last few weeks/months/years. And speaking of the ghost of data present…
Real-Time BGP Data!
If you can visit every house in the world in one night, can you make my BGP data real-time? BGP errors are common and a real pain in my neck. The technology was never designed for security, but every time there’s a BGP hijack, I’m panicking about where my (potentially sensitive) data is going! I end up trying to figure that out while dealing with BGP monitoring data that’s 5 minutes old at best and 15 minutes old at worst. That means I can easily spend an hour or more just trying to diagnose the problem, let alone fixing it. A way to see a lot of different BGP sources in real time would let me fix BGP leaks, hijacks and route errors so much faster. Which brings me to my next request…
More Observability Vantage Points!
I’m sure you have a great view from your sleigh, but my vantage points are much more limited. I have customers and co-workers all over the world and I am seldom able to monitor the network from where they are. Sure, network problems are easy to solve if the employee is in the office or the user is in a major western city. But I have users accessing my network from places where I simply have no observability. Is that Account Manager’s connectivity problem due to his laptop, his WIFI, the WIFI network of the Ankara airport he’s sitting in or the local Turkish ISP? I could figure that out a lot faster if I had a node in Turkey I could test! I’ve got access to a couple hundred public and enterprise nodes, but I really need a couple of thousand to do my job properly and keep my users happy. And speaking of keeping people happy…
Some Helper Elves!
I don’t suppose you could spare a few of your shop workers to help me out for a while? Like all IT teams, we’re understaffed and vendor support is pretty hit or miss (more ‘miss’ from most vendors, frankly). If your elves can make PlayStations and iPads, couldn’t they also provide some tech support for some of these complex observability and monitoring tools I use every day? I really need answers to questions and some expert advice setting up dashboards and tests, but I have a lot of trouble getting that help sometimes. Responsive, knowledgeable support would be worth it’s weight in gold – to say nothing of frankincense and myrrh!
Santa, I guess what I’m really asking for this year is a reliable, stable and resilient Internet. That’s what I try to provide to my users every single day, but I could use some presents from you to make it all a lot easier.
Or, I suppose, you could just put Catchpoint under the tree for me. That would be awesome!
Every ITops, NetOps, and SRE Worker in the World