For our frequent readers, it is no surprise that we firmly believe DNS has a huge impact on Performance. One of the biggest challenges with DNS is understanding what impact various DNS issues can have on end users. DNS resolution is complex and we lack insight on how end users’ DNS resolvers are configured to properly calculate impact.
It is easy to understand the impact of a hard DNS failure – all authoritative servers for the domain are down, the domain will be unreachable by any user. But what happens on a more common case where one particular authoritative server is slow, or unreachable?
Here is an example of such case a client experienced recently. They monitored the website via a browser based agent (IE 8) and monitored their Managed DNS provider through our DNS monitor. On December 27th 2011, for about 1 hour two of the four authoritative servers were un-reachable from the East Coast.
In this case there was a 50% chance that a resolver would reach a server with a problem and eventually timeout. The resolver would than retry the query on the next authoritative server which hopefully responded correctly. Worst case scenario resolver would go through 3 requests (two timeouts sequentially) to get the proper answer (16.67% chance).
Certain companies / individuals suggest that this timeout / retry scenario comes at no cost to the end user experience, ZERO milliseconds impact. We have an on-going debate with a major CDN provider regarding this issue, you know one of those “No! The earth is not flat!” arguments.
The answer is not simple, probably to some end users there is no impact because the DNS was cached or the DNS resolver has smart handling built in to avoid servers timing out for periods of time. However, not all domains are that popular to be in a resolver’s cache, and not all resolvers can avoid unreachable servers always.
So did this DNS failure, caused by dead Name Servers have an impact on web performance? You bet it did!
Here is the response time and DNS lookup time as captured by the Internet Explorer 8 agent which relies on a commercial DNS resolver located at the same location as the agent. During the DNS outage, the response time (time to load the base page URL) spiked due to higher DNS resolution.
When your DNS or CDN’s DNS servers experience problems, there will be a CLEAR impact for at least some of the users (if not all users). A DNS timeout/retry bears a cost that varies based on the features and configuration of the DNS resolver (Bind, Microsoft, etc). In a world where everyone is trying to be as fast as possible and a competitor is a click away, no one should take such a gamble and hope no one noticed! Your users will notice, and they will be frustrated.
How can you mitigate such DNS problems:
- Avoid Short Time to Live (TTL) when possible. Some domains have a DNS TTL of 10-30 seconds, if the average user spend 5 minutes on your web site, the OS will have to resolve DNS several times. DNS is the 411 of the Internet, imagine you are talking to your bank on the phone and every 30 seconds the call drops and you have to redial!
- Quickly fix/replace/re-route failing DNS servers
- Utilize a distributed DNS infrastructure that relies on Anycast. Article from Patrick Meenan about the importance of Anycast and its impact on Web performance.
- Ensure your infrastructure has capacity and DDOS protection.
- Ensure your DNS records are valid. Make sure to remove internal LAN DNS records, decommissioned servers or datacenters, staging servers, or your personal work desktop.
- Ensure your partners do not poison DNS cache. If you delegate a DNS record to a third party (like a mobile proxy or ad serving…), ensure they do not set their own DNS servers as authoritative for your domain.
- Avoid multiple CNAMEs. They add complexity and time to domain resolution.
- Ensure you properly monitor DNS infrastructure and DNS impact on HTTP requests. You need data to understand impact and detect problems to properly react to issues.
Mehdi – Catchpoint