Learn

Slow DNS

Slow DNS: Understanding DNS Performance Best Practices and Troubleshooting

DNS is one of the most critical components of the Internet, and its operation is a key factor in determining how good or bad the end-user experience will be. Very slow DNS responses can create a poor user experience that results in customers quickly leaving websites—possibly before loading them at all.

In this article, we will discuss how DNS resolution works, explaining the transport time from when the end-user device makes a DNS request to when the user is returned to the IP address to connect to. We will share some statistics on DNS resolution times, discussing the factors contributing to potential delays and how we can measure the performance of different components. Finally, we’ll provide some recommendations for improving DNS performance and providing a good user experience.

Summary of key concepts

Defining DNS lookup time The DNS lookup time is measured from when your computer requests a DNS record until it gets the correct response.
What Does Slow DNS Mean? DNS response is considered slow when DNS resolution increases overall time enough to have a negative impact on the user experience.
Factors Involved in DNS Lookup Time DNS lookup time depends on Internet connectivity, latency from servers, configuration particulars, and DNS server performance.
Troubleshooting Slow DNS Problems To troubleshoot slow DNS, use network latency tools (e.g., ping and traceroute) and DNS performance testing tools (e.g., dig and DNSPerf).
Best Practices Ensuring Fast DNS Performance Use CDN for high availability, perform benchmark and performance tuning, increase DNS TTL values, and use CNAME (DNS aliases).

Defining DNS lookup time

When you type a website address in your Internet browser, your computer needs to find the IP address of the website before it can fetch the website content. To get the address, your computer asks your configured DNS server (resolver) to get the website address resolved. The DNS resolver starts the iterative resolution process of querying the DNS root servers for the top-level domain (TLD) servers, followed by a request sent to the TLD servers for the authoritative DNS of the website. The final request is sent to the authoritative DNS for the website itself, and the IP address is passed back to your computer. 

Each of these steps takes time, all of which adds up to the total resolution time before your computer can get the website content you’re after. How slow or fast this DNS response time is will determine how good or bad your user experience will be.

{{banner-23="/design/banners"}}

Causes and effects of slow DNS

Research shows that website performance (specifically, DNS resolution time) has a large impact on whether a customer remains on a particular site or goes to a competitor’s site for necessary information or services.

Google’s market research shows that when the website page load time increases from 1 second to 3 seconds, the end-user bounce probability increases to 32%; when it goes up to 5 seconds, the probability rises to 90%. Ideally, a maximum of 100 ms of DNS lookup time, and preferably less than 50 ms, will contribute to a good user experience, giving your website content a margin of 1-2 seconds to load in the browser. 

The following snapshot from webpagetest.org shows the performance of the cisco.com website. The DNS lookup time is added to the website’s connection delay and page load time. Here two DNS lookups are done: first cisco.com (taking 25ms) and then a redirect to www.cisco.com (consuming an additional 33 ms).

Performance snapshot for cisco.com
Performance snapshot for cisco.com

The above table (partially) lists the different requests that are done to load the complete webpage content. It took more than 6 seconds for the page to look usable, which is far longer than the ideal benchmark. We will discuss how you can measure your DNS performance and improve on this performance in the following sections. 

Factors involved in DNS lookup time

Internet communications are based on the TCP/IP reference model. Whenever two hosts or computers talk, the communications starts at the Application Layer on one host. It passes through the Internet Layer (also sometimes called the Network Layer) and is communicated over the Internet to reach the other host. On the other end, the data climbs the layers (in reverse order) to the Application Layer, which receives the communication, processes it, and then sends a reply. This reply follows the same layers in the opposite direction. 

Internet layer communications architecture
Internet layer communications architecture

When your computer initiates a DNS resolution request, the DNS client application in your computer creates a DNS query. This query passes through your computer’s networking stack, goes out via your Internet connection, is received by your designated DNS server network, and is passed to the DNS resolver application. The resolver performs the above-mentioned iterative resolution process and returns the IP address to the requesting client. Each step in this layered communication takes time. 

Here are some of the important factors contributing to total DNS resolution time:

  • End-user Internet connection speed and congestion
  • Geographical distance and network latency between the user and the DNS server
  • Non-optimal routing between source and destination, regardless of geographic distance
  • DNS server network performance, latency, and congestion
  • The geographic distance between the resolver and the TLD and authoritative DNS servers
  • The computational resources of the DNS servers for handling incoming requests
  • DNS application optimization and tuning based on compute resources and the number of requests

Troubleshooting slow DNS problems

Let’s say you have an online shopping website, exampleshop.com, and you have acquired web hosting and DNS services from (imaginary) Big Hosting Co. You have been getting feedback that your end users are seeing something like “This site can’t be reached” in their web browsers. You suspect that this might be an issue with the DNS resolution of your website being problematic because the website and hosting performance look good.

There are different tools, many of them free, to identify and troubleshoot the causes of slow DNS responses. While troubleshooting, it’s recommended to follow a methodical approach to identify and isolate the cause of the problem. There may be challenges due to the different components at different levels of the networking stack, but you can start by eliminating the possible causes one by one.

Testing network performance

We will use a Linux machine for the tests below. We will first start with testing the network for any possible performance issues. The most basic test is using the ping command, which sends a small message (Echo Request) to a destination from your machine and waits for a response (Echo Reply). If the answer is received, ping indicates the time for the message to return. Otherwise, it shows you a timeout message. 

Here we sent three messages to the Big Hosting Co. DNS server and got an average response of about 28 ms and no packet loss. 

$ ping -c 3 ns1.bighosting.co
PING  (192.168.53.10) 56(84) bytes of data.
64 bytes from ns1.bighosting.co (192.168.53.10): icmp_seq=1 ttl=57 time=28.3 ms
64 bytes from ns1.bighosting.co (192.168.53.10): icmp_seq=2 ttl=57 time=28.1 ms
64 bytes from ns1.bighosting.co (192.168.53.10): icmp_seq=3 ttl=57 time=27.6 ms

---  ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 27.644/27.989/28.256/0.255 ms

Remember that you have customers from all over the world coming to your website, and you can perform the same test using a looking-glass service. Looking-glass is a publicly available service that provides different tools to perform network testing or routing information lookup from the service provider’s perspective. Many big providers offer looking-glass services, which can be easily found using PeeringDB. One option is https://lg.he.net from Hurricane Electric, which allows you to use network commands from many locations in its global network. 

We can perform extended testing of the network using the traceroute command. While ping provides end-to-end stats, traceroute gives more visibility of each network hop between source and destination.  

You can take multiple measurements using ping and traceroute at different intervals to get a sense of network performance at various times. If any abnormal delay or loss shows in the results, you will need to dig deeper to discover the network issues. If network performance appears normal, you can move on to application layer troubleshooting. 

{{banner-25="/design/banners"}}

Testing DNS application performance

One of the tools for testing the responses of DNS applications is the command line utility dig, which stands for “Domain Information Groper.” You can use dig to perform DNS resolution of queries and verify that your domain is resolving to the correct records. Dig also provides detailed information along with the answer, like TTL values and response times. 

DNS response tests

In the example below, we send a query to ns1.bighosting.co to resolve the IP address of exampleshop.com (output reduced for brevity).

$ dig exampleshop.com @ns1.bighosting.co

;; QUESTION SECTION:
;exampleshop.com.			IN	A

;; ANSWER SECTION:
exampleshop.com.		305	IN	A	192.168.90.100

;; Query time: 34 msec

First, verify that you receive the correct address for the A record. Second, check the response time, which is 34 ms in this case. The response time is appropriate for a network latency of 28 ms (as we tested above). If this response takes more than 100 ms, you should look into the performance of the DNS server (which we will do below).

You can also use dig to simulate your machine behaving as a resolver to perform end-to-end DNS resolution of a record. The +trace option gives you the complete iterative flow with the response time of each step (again, the output below is trimmed down).

$ dig exampleshop.com +trace

;; global options: +cmd
.			308838	IN	NS	a.root-servers.net.
.			308838	IN	NS	g.root-servers.net.
.			308838	IN	NS	f.root-servers.net.


;; Received 811 bytes from 127.0.0.53#53(127.0.0.53) in 1 ms

com.			172800	IN	NS	f.gtld-servers.net.
com.			172800	IN	NS	g.gtld-servers.net.
com.			172800	IN	NS	h.gtld-servers.net.

;; Received 1169 bytes from 198.97.190.53#53(h.root-servers.net) in 38 ms

exampleshop.com.		172800	IN	NS	ns1.bighosting.co.
exampleshop.com.		172800	IN	NS	ns2.bighosting.co.

;; Received 773 bytes from 192.48.79.30#53(j.gtld-servers.net) in 199 ms

exampleshop.com.		600	IN	A	192.168.90.100
exampleshop.com.		1800	IN	NS	ns2.bighosting.co.
exampleshop.com.		1800	IN	NS	ns1.bighosting.co.
;; Received 268 bytes from 192.168.53.10#53(ns1.bighosting.co) in 47 ms

The total time to complete the resolution here is around 285 ms. This time includes the network latency from the test machine to each DNS server plus the processing time of each server. The resolution time would be higher for clients located more network hops from the authoritative DNS server.

To improve response times for the end users, DNS caching resolvers and other network caching tools include a caching mechanism in the DNS resolvers. The resolvers return cached records very quickly because they don’t have to follow the complete iterative process again.

If you have access to machines in different parts of the world, you can use the dig command to measure the DNS responses of your domain from various networks and get a sense of the latency that your end users would experience.

DNS server performance benchmarking

Another helpful open-source tool is DNSPerf. With DNSPerf and a Linux machine, you can benchmark authoritative DNS servers by simulating traffic from multiple DNS clients. You can send a high volume of queries to DNS servers to measure their upper limits in terms of handling traffic. You can then fine-tune the application configuration and server resources based on these benchmarks.

In the following example, we will test the ability of the ns1.bighosting.co server to handle 100 queries per second (QPS) for 30 seconds. 

Create a test input file to define which DNS queries to perform.

exampleshop.com	A
exampleshop.com	AAAA
exampleshop.com	MX
exampleshop.com	NS

Next, start the dnsperf benchmark by providing as input the above-created file (output reduced).

$ dnsperf -d inputfile -s ns1.bighosting.co -l 30 -Q 100

[Status] Testing complete (time limit)

Statistics:

  Queries sent:         3000
  Queries completed:    3000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 3000 (100.00%)
  Average packet size:  request 29, response 137
  Run time (s):         30.000116
  Queries per second:   99.999613

  Average Latency (s):  0.006252 (min 0.002663, max 0.126665)
  Latency StdDev (s):   0.006760
  

The benchmark results detail how the DNS server performed under this load. The output stats will show how many queries were handled by DNS and how many were lost, the average latency for the responses, and whether any errors were returned. You can increase the number of threads, QPS, and test duration to test the upper limits of the target DNS server and check how the server will perform under different conditions.

These network and DNS testing utilities provide good visibility but only at a specific time. It is highly recommended to utilize network and application monitoring platforms to have continuous visibility along with storing the performance history of your networks and applications. 

Catchpoint Internet performance monitoring provides a complete DNS observability solution. Catchpoint has multiple monitoring nodes deployed in backbone and last-mile networks. The platform performs DNS measurements from an end-user perspective. It can ensure DNS record integrity by alerting on any unexpected record changes and accurately measuring latency, packet loss, and round-trip times to and from your DNS infrastructure, representing a true end-user experience. You can learn more at www.catchpoint.com

{{banner-26="/design/banners"}}

Best practices for ensuring fast DNS performance

The following are some key recommendations to ensure that your DNS performance is what it should be.

Use a DNS provider with CDN architecture

When choosing a service provider, it’s always good to do some research and shortlist the ones that meet your minimum uptime and performance criteria. Based on the criticality of your application, you may want to go with two different providers because if your DNS goes down, all your applications and services will go down with it, which might result in significant financial losses.

Some of the recommended criteria while choosing a DNS provider include the following:

  • The number of geographically distributed DNS server/resolver nodes.
  • Availability assurances and the failover mechanism used by the provider. Most prominent providers use anycast across their data centers globally to provide minimal latency in different parts of the globe and automatic failover across data centers. 
  • Service uptime, quality, and latency guarantees from the provider.
  • Feedback from existing customers.

You can also use a third-party monitoring service, like DNSPerf, to get comparative statistics of different service providers.

Performance comparison of DNS providers
Performance comparison of DNS providers

Consider self-managed DNS

Another option for your DNS service is to use self-deployed and managed DNS servers. You might deploy your own DNS if you have the required technical skills and want greater control over your service; if so, consider these best practices:

  • Deploy two or more servers in different geographic regions.
  • Deploy the servers in self-managed data centers or hosted on-cloud infrastructure.
  • In the case of cloud deployment, choose well established and highly rated cloud service providers to deploy your DNS servers.
  • After deployment, use nslookup/dig commands to verify that all your DNS servers are responding with correct DNS records.
  • Before going into production, use benchmarking tools to verify that the DNS servers can handle the expected traffic.
  • Fine-tune the server configurations and change the specs based on your test results.

Increase DNS TTL figures to enhance the benefit of caching

The DNS records are configured with time to live (TTL) values. The TTL number determines how long the records will remain in the cache of DNS resolvers. Longer TTL values mean a more significant benefit of caching, resulting in quicker DNS responses for end users.

Using DNS alias or CNAME flattening

One common practice in DNS records is having multiple records point to the same address. For example, we may want to use exampleshop.com and www.exampleshop.com to point to the same website. For this, we configure the A record for exampleshop.com and define www.exampleshop.com to be a CNAME record. When a user tries to open www.exampleshop.com, it results in two DNS lookups: the first getting the CNAME record and the second fetching the actual IP. The result is double the time for getting DNS records.

Many DNS applications and DNS providers now support ALIAS record or CNAME “flattening.” The concept is that when you define a record as an ALIAS (www.exampleshop.com) pointing to another name (exampleshop.com) and a query comes for www.exampleshop.com, the DNS server itself will perform the A record lookup of the ALIAS and return the IP address to the client in one query. This achieves the benefits of CNAME without the double lookups. 

There is a potential downside to this, however. If the client-side DNS resolvers use older BIND DNS software that does not recognize this type of record, then the DNS query for the entire domain can flat-out fail. 

Use DNS prefetch in website code

Most modern websites are dynamic, pulling content from various sources (images, videos, fonts, etc.). DNS lookup of each source would again take time, delaying loading the web page. 

As a website developer, you can add a small code snippet in the HEAD element of the web page, which will make the DNS resolution of all the listed external websites occur before the user opens or views the link. The capability is called DNS prefetch and is supported by almost all major browsers

The following is an example from the Mozilla Developer website for adding DNS prefetch of external font URLs. 

Summary of key concepts

DNS performance is a key factor in overall website performance and the quality of the user experience. This article examined DNS lookup time, the causes of slow DNS response, and the factors that can contribute to slowdowns.

Troubleshooting slow DNS performance can involve using ping and traceroute to test network performance, dig to test individual DNS responses, and DNSPerf to perform high-volume testing of DNS servers. You can perform these tests at different intervals to check how the server performs at different times.

Best practices for ensuring good DNS performance include using providers that support CDN, considering the use of self-managed DNS, increasing DNS TTL parameters, and using CNAME flattening or DNS prefetch. A combination of these approaches can help eliminate slow DNS problems and help improve your bottom line.

{{banner-24="/design/banners"}}

What's Next?