At Catchpoint, we are constantly monitoring key websites and key infrastructure providers (DNS, Cloud…) to help our customers deliver a positive customer experience. We are always on the lookout for bottlenecks or latency that could impact application performance. Recently, we identified a DNS related issue impacting hundreds of our customers. Based on our data, a major DNS provider had performance issues in Brazil that resulted in slow performance.
On March 13, 2018 we detected a noticeable increase in performance when accessing websites from Brazil. Catchpoint quickly narrowed down the root cause to a DNS provider having issues in that region.
DNS resolution which usually takes around 50 milliseconds jumped to almost 1 second. The latency was specific to the DNS resolution time and started to impact performance on the March 13th.
The graph above shows the time taken to download a file using Catchpoint’s object test. The high overall DNS resolution time, which was almost 1 second, consequently pushed the page response time as well.
Impact on Overall User Experience
The DNS resolution time has a direct impact on page performance; the graph below illustrates how the higher DNS time delayed the Render Start (time taken to load the first visual content on the browser) and eventually the Webpage Response or the time taken to fully load the webpage.
It is important to note that if multiple third-party services integrated with the website are dependent on the same DNS provider this issue would have an even greater impact.
We can see the drastic change in the DNS load time in the graph below, showing the distribution of responses times the week prior and the week of the incident. 90% of the users experienced a DNS load time under 100 milliseconds, the previous week. In contrast, during the week of March 13th, only 35% of the users experienced a DNS load time of 100 milliseconds, while 50% of the users experienced a DNS load time of almost 1 second.
The page performance dipped by approximately 25%, this translates to lower page views and increased bounce rates. We blogged about a similar chain reaction in performance when an issue with a third party service slowed down entire applications.
Root Cause Analysis
We were able to determine there was a drop in DNS performance, but we still had to figure at what point in the DNS resolution process, was there a performance issue and what exactly was causing the latency.
To troubleshoot further, we used Catchpoint’s DNS Experience test type (dig + trace). This allowed us to sift through each query in the DNS resolution process until we found the query that was causing the delay; as the graph below clearly shows, it was a second level query that had higher response time.
The second level query can reach out to four different nameservers, two managed by provider A and the other two managed by provider B. Analyzing these nameservers helped us identify those that slowed down (as highlighted in the graphs below).
We were able to identify the issue was specific to a DNS provider and confirm it was impacting only a certain region. Our customers, most of whom rely on the same managed DNS service provider for hosting their websites, were immediately notified so they could get the issue addressed immediately and minimize the impact it had on the end-user experience.
Our customers constantly reach out to us when they need help troubleshooting performance issues. The scenario we discussed shows us the crucial role each component plays in providing a good customer experience. Monitoring your DNS infrastructure (yours or managed) is critical. DNS is the first thing your customer interacts with. Latency at any point causes a ripple effect which eventually results in negative user experience.
Catchpoint provides an array of tools such as, DNS Experience and DNS Direct, to help determine what went wrong and why it went wrong. Such comprehensive DNS testing must be part of your overall monitoring strategy and web performance optimization strategy; especially when using a third-party DNS provider. You can read about the importance of the specific DNS tests that Catchpoint provides and how it works in this post.