The ever-growing presence and complexity of third-party systems within IT architecture is arguably the biggest challenge that any DevOps or SRE professional faces on a daily basis. With so many components of IT infrastructure lying outside of your direct control, it’s an uphill battle to even get a clear picture of their performance and impact on your customer experience, let alone to pinpoint and troubleshoot the exact nature of problems when they arise.
Within this realm of third-party components, perhaps the most daunting challenge lies with network optimization. It is here that the complexity usually reaches its zenith due to firewalls, the ever-changing nature of public internet, and varying levels of bandwidth and capability of the infrastructure in different locations around the world.
One of the first steps to ensuring optimal network performance is selecting the best possible providers of crucial services such as DNS. This is done through research of the different vendors who serve your most crucial geographies around the world; not only must they be able to provide consistently excellent performance and availability (which can be learned through benchmarking them against their competitors), but also willing to work with their customers to quickly resolve issues.
The other important preventative measure that must be taken is understanding the impact of different regional networks on the end users’ digital experience, and planning accordingly. In China, for example, the existing infrastructure is often inadequate for serving the demands of the growing user base, so it may be necessary to utilize first- or third-party servers or a CDN to reach users at the edge. As detailed by SAP Hybris in their recent OpsCast webinar, the practice of evaluating different regional networks as they relate to the location of your data centers is a crucial step in reducing end-to-end performance times for global users:
Of course, selecting the best providers is only the first step in the battle. Once in place, those providers must be constantly monitored on a regular basis and on both sides of the firewall to catch performance issues in real time. Doing that requires gaining a unique viewpoint into the performance of the different networks that your services are utilizing, be they part of the public internet, private networks like AWS, or internal networks such as those used by a specific branch office.
This can be achieved through Catchpoint’s OnPrem nodes, which can be plugged directly into any network in order to get the best possible viewpoint into the performance and limitations of those servers. In the case of a company like SAP Hybris, a cloud provider that utilizes both public and private internet servers, they use OnPrem nodes on those networks to test for issues related to peering, DNS, etc., and then work with the necessary providers to resolve them as quickly as possible. The ability to test for protocol issues such as TCP, NTP, and BGP both inside a specific network as well as across the internet backbone mean that a performance expert can provide critical information which drastically reduces the amount of time between when a performance problem is first detected and when its root cause is identified.