learn

The Comprehensive Guide to DNS Monitoring

DNS, or Domain Name System, is a service that translates human-readable web addresses (e.g., app.example.com) into computer-readable IP addresses (e.g., 192.1.1.1) and stores them in a hierarchical address database. The result of this process is referred to as a namespace and enables users to consistently find, remember, and search for web destinations. In our example, ".com" is the top-level domain, "example" is the secondary domain, and "app" is a subdomain within the larger secondary network. Putting it all together defines the full namespace structure that the TCP/IP network will use to connect to the right destination. Think of the DNS service like a postal service: mail requires an address that includes the street number, street name, unit or apartment number, city, state, and a zip code. Even more information is needed to forward mail outside of the United States. Every domain name must be associated with an IP address. As of the first quarter of 2020, 366.8 million domains were registered, representing an increase of 4.5 million new domain name registrations (1.2 percent increase from the previous quarter). This results in a large number of records stored on DNS servers distributed across the globe that require resolution to IP addresses.

{{banner-16="/design/banners"}}

The Components of DNS

The individual records of a DNS are called Resource Records (RR) and the individual parts of a DNS database are called zones. Within these zones are several server and record types. In order to successfully monitor DNS, it’s important to be familiar with what each component does in the larger system.

"DNS Monitoring requires advanced tools that can effectively trace queries through a complex hierarchy of servers, network links, and services."

DNS Server Types

When a user inputs a hostname ( e.g., www.example.com), the act of processing that hostname into an IP address to deliver them to their chosen destination is called resolving. Resolving a hostname requires four different DNS server types.

1. DNS Recursor

The recursor server receives queries from client machines through applications like web browsers and checks for the resolving IP address in its cache. They are also responsible for making any additional requests to satisfy the client’s DNS query. Recursor servers have no authority over record information.

2. Root Name Server

The root server is contacted when a DNS Recursor can’t find the relevant address in its cache. It exists at the top of the DNS hierarchy in a space known as the root zone. Queries reaching the root zone are redirected to the correct zone by responding to the recursor with the IP address of the Top-Level Domain (TLD) nameserver that should handle the query. The internet consists of 13 root zone servers.

3. TLD Nameserver

The top-level domain server (TLD) handles the next step in the search for a specific IP address. It catalogues domain names that share the same top-level domain (e.g., `.com`) and provides the recursor server with the relevant authoritative nameserver’s IP address that it should check.

4. Authoritative Nameserver

The authoritative nameserver possesses information for specific hostnames, such as example.com. It resolves the hostname to its corresponding IP address and sends that address back to the recursor server, where it is then passed to the client’s browser. The browser then accesses the site using the IP address.

There are three types of authoritative nameservers: primary, secondary, and stub. The primary authoritative nameserver keeps read-write copies of DNS records and is the server where changes are made. Secondary and stub servers are read-only, with stub servers keeping specific records of information (such as SOA records or glued A records).

DNS Record Types

DNS record types give instructions for forwarding, filtering, and contextualizing requests. The data within them can contain refresh rates, expirations, ownership information, certificates, routing preferences, and more.

A Records

A (Address) records map a website’s domain or subdomain to an IPv4 address. Typically websites have only one A record -- though larger sites that require round-robin load balancing may have many A records.

CERT Records

CERT (Certification) records verify the authenticity of a site and deliver encrypted data . CERT records are required for sites that use logins or process payments.

CNAME Records

CNAME (Canonical) records provide an alias that allows you to use title variations which direct users to the same part of your website. For example, `help.mystore.com` versus `support.mystore.com`.

MX Records

MX (Mail Exchanger) records help send and receive emails by defining which servers to deliver mail to and which are preferred for handling delivery. This record converts `yourname@workemail.com` into an IP address, similar to A records.

NAPTR Records

NAPTR (Name Authority Pointer ) records link telephone numbers and e-mail addresses to VOIP and SIP services and are less commonly used.

NS Records

NS is an acronym for “nameserver.” The NS record points to the DNS server that is responsible or authoritative for a specific domain. The NS record tells the client where to go to find out a particular domain’s IP address.

PTR Records

PTR (Pointer) records direct IP traffic to domains or hostnames. This is the opposite of an A record. These records are often used to verify spam for some email programs.

TXT Records

TXT (Text) records contain data for the Sender Policy Framework (SPF), which verifies domain ownership. This record is important for holding domain owners accountable for how they use domains.  It’s also one way to protect your domain from being used to send spam.

{{banner-17="/design/banners"}}

The DNS in Action

Although resolving hostnames is an extensive process, it is usually near-instantaneous. And because a webpage usually contains content (images, videos, links) sourced from other locations, one query usually requires many DNS resolutions.

A DNS resolution performs a recursive query directed at the root server, Top Level Domain (TLD) server, and an authoritative server.
A DNS resolution process shown as 10 steps

As you can see, there are many components and queries involved in making your applications accessible to your users. If there’s a problem within the DNS, you may not even realize users in certain locations can’t reach you. This is why DNS monitoring is so important (which we’ll come back to later).

{{banner-18="/design/banners"}}

DNS Providers

DNS providers offer either managed or unmanaged services. Managed DNS providers provide customers with computing power to facilitate traffic across websites, applications, and networks. They also enable domain registration. Customers of managed DNS services typically use web consoles or desktop applications to control DNS traffic, manage DNS data, and authenticate users.

The benefits of using a Managed DNS Provider include:

  • Large global DNS network to support resiliency
  • Intelligent traffic routing (geo-location, policy)
  • Global load balancing, cloud migration
  • DNS failover for service continuity
  • Secondary DNS
  • Distributed denial of service (DDoS) protection
  • Web application firewall (WAF)
  • Anti-malware
  • DNS analytics (log generation, traffic analysis, usage & downtime trends)
  • Availability and disaster recovery
  • DNS propagation and change management
The largest managed DNS providers in order of market share are Amazon Route 53, ClouDNS, GoDaddy, Cloudfare, Google, UltraDNS, DNSimple, easyDNS, Azure DNS, DNSMadeEasy.
The largest managed DNS service providers

Unmanaged DNS providers are essentially domain registrars that enable customers to purchase or reserve a domain.

{{banner-3="/design/banners"}}

The Importance of DNS Monitoring

DNS is an essential component of trafficking users to their destinations. And because most businesses rely on external DNS providers, there is often very limited insight into the service overall in terms of reachability, performance, and real-time record security.  In this section we’ll take a look at some of the issues DNS monitoring can help resolve. The covered sections are:

  1. Micro-Outages
  2. Misconfigurations
  3. DNS Poisoning
  4. Denial of Service (DoS) Attacks

1. Micro-Outages

Micro-outages briefly prevent users from resolving your domain and accessing your service.  These outages can last from minutes up to an hour and have varying impacts depending on services involved. The cause of micro-outages is often masked by Anycast systems used by many DNS providers. Anycast systems help ensure DNS resolution’s low latency by sending a query to many servers instead of just one, making it challenging to discover the underlying issue. Common causes of micro-outages include: physical resource (e.g., datacenter) unavailability, routing and connectivity issues, server performance problems, or network capacity limitations.

2. Misconfigurations

Misconfigured DNS servers can significantly impact your user experience. Let’s take a look at some examples.

CNAME as Apex

Although CNAME records are often used to provide aliases for an existing A record, known as the CNAME’s owner record, they should never be configured as the apex domain name. This is because of the relationship CNAME records have with their owner and target records. CNAME records rewrite all DNS records belonging to their owner and instead send those of the target record. Having both an A record and CNAME record as the apex causes problems in the zone; the apex A record cannot both be the CNAME owner and its target. This ultimately results in resolution failures.

For example, `www.ggle.com` can be a CNAME for `google.com`, but `google.com` should never be a CNAME, as it is the apex domain.

Missing Glued Records

Glue records are simply A records that have an associated NS record mapped to it. This provides the NS record with an IP address, allowing the server to resolve its own fully qualified domain name. Without glued records, resolutions and dynamic update tasks may experience problems such as delegation, DDNS updates, and query resolution failures.

Incorrect TTL Values

DNS TTLs determine the lifespan of an answer within the DNS cache. Because caching plays a critical role in resolving queries, having incorrect TTL Values could mean the difference between a 1ms cached response versus potentially thousands of milliseconds needed to fetch an answer from the internet. Deciding how long to cache answers depends on the environment. Highly dynamic environments would experience issues with a 24-hour TTL lifespan, since changes occur much more frequently. More static environments however may not need a 5-minute TTL lifespan--and could even experience performance gains by increasing the value.

Lame Delegation

Domain names are required to have at least two nameservers. When a query is made, the nameserver that answers can either be authoritative or “lame,” meaning that although it was designated as authoritative, it has no authoritative zone information for the domain name. Make sure you configure all nameservers correctly to be authoritative within the appropriate zone for their associated domain names.

{{banner-19="/design/banners"}}

3. DNS Poisoning

Incorrect configurations can present bad-actors opportunities to infiltrate a service and direct traffic to malicious websites in an attack called DNS Poisoning. Massive hacking efforts that span world-wide have occurred using vulnerabilities discovered in DNS settings. The following diagram is an example of how even one A or NS record can compromise your traffic:

DNS Spoofing or Cache Poisoning is when an attacker tampers with the DNS records which then get propagated to other servers and local cache.
Attackers tamper with a record as part of a DNS Poisoning attack

What’s worse, DNS Poisoning can spread if the affected server is a resource for ISPs. The ISPs then forward the compromised records to home routers and personal computer DNS caches.

4. Denial of Service (DoS) Attacks

Hackers may attempt to render your web resources unavailable to users by flooding a specific URL with an overwhelming amount of requests. This is referred to as a Denial of Service (DoS) attack, which essentially crowds out real traffic, either slowing or completely interrupting all use.

Similarly, a Distributed Denial of Service (DDoS) attack applies the same concept but recruits thousands of maliciously infected machines (known as “botnets”) across the internet to take down the service. More recently, however, memcaching has become a popular DDoS technique.

Amplification DDoS Attacks

Amplification attacks leverage simple requests, such as a short search,  that require much larger responses. After finding such a request, the attacker then floods the server with these requests, forcing the DNS to respond with much heavier replies.

Reflection DDoS Attacks

Attackers often try to send large, masked queries that appear to have come from their victim’s IP address. The victim then receives the answer and is overwhelmed by the response’s traffic to their network. This is done by querying a recursive nameserver using the spoofed IP address, causing potentially two attacks: one to the authoritative nameserver being queried (via amplification) and one to the victim that receives the response. This can often make the victim look like the attacker of the nameserver, causing even more issues.

How to Monitor for DNS Performance

In this section, we will explore specific monitoring tests designed to detect a breach of DNS integrity or performance. The monitoring tests are grouped in the following sections:

  1. DNS Mapping (based on end-user proximity)
  2. DNS Records (evaluating integrity of the records)
  3. DNS Performance (measuring the resolution latency)

1. DNS Mapping

The best way to reduce the latency involved in resolving a domain name is to use a DNS server that is geographically closest to the end-user. Avoid using DNS servers located on a different coast or continent.

The process of mapping end-users to DNS servers based on geo location differs depending on your managed DNS provider. Essentially, the managed DNS service provider compares the GPS location of the querying IP address to the location of each server configured within the DNS record. Sometimes the edns-client-subnet (ECS) DNS extension is used to determine the subnet and use it to identify the physical location of the IP address.

This test therefore verifies that the closest DNS server is being provided when querying your domain from different locations around the world. A successful local DNS resolution should take less than 20 milliseconds.

2. DNS Records

The series of tests documented in this section are meant to validate the integrity of the records that support the domain resolution and ensure that they haven’t been either mis-configured or maliciously tampered with by attackers.

Test DNS Delegation

This is one of the first and most basic tests. DNS delegation verifies the name server (e.g., `.com`) matches the correct zone (e.g., `catchpoint.com`) and returns the right answer. The below table traces DNS delegation for `catchpoint.com`.

An example of DNS delegation where the closest server to the end-user is selected.
An example of DNS delegation tracing the server assignment

Test NS Records & Root Servers

After DNS Delegation is successful, the nameserver must be responsive to requests via TCP. Nameservers that fail in this task may be misconfigured or prevented from responding due to a firewall filter. Monitoring for successful responses ensures both your nameservers and your security measures are configured correctly to permit, receive, and respond to the appropriate traffic. It’s also important to validate your backup records as well. Though usually pre-configured by most providers, the root hints file is another important component worth verifying that includes the names and IP addresses of all root servers.

Monitor SOA Records

SOA (Start of Authority) records hold serial numbers and other important information on a zone’s cluster of DNS servers. Knowing when these records expire or have changed can shed light on performance anomalies and contextualize them as being either mundane or malicious. SOA records tend to be updated whenever the Zone file is updated. If your environment is more static and its zone files are rarely updated, this can be a good indication to investigate the changes made.

Check MX & SRV Records

Attackers may attempt to exploit MX records for “whaling,” essentially trapping large amounts of sensitive information with one malicious transaction. It’s important to verify that your MX & SRV records are both resolving and also resolving to the correct exchange. Make sure these records are using the correct server preference; many domains use MX records pointing to exchanges with nested levels of spam filtering in place. Attackers know this and often attempt to send mail to the MX record with the highest preference value, betting that it is the least protected.

Check Zone Transfers

The primary and secondary DNS servers must maintain identical records. This is accomplished by synchronizing the zones files that contain specific resolution information for specific domains. There are scenarios when the DNS zone transfers may not have been completed, or at least not equally across both the primary and secondary DNS servers, thus preventing resolutions from completing. The zone transfers should be monitored for regular updates and completion.

Verify DNSSEC Configurations

Security is important for every step of the DNS lookup process. DNSSEC provides optional security extensions to reduce vulnerabilities to DDoS Attacks, however you must monitor that they are 1) set up and 2) configured properly.

3. DNS Performance

Track DNS Propagation

DNS propagation is the measure of time it takes to update all relevant systems about changes to a DNS record, such as its hostname details.  If users are querying to a system where DNS propagation has yet not yet occurred, they will receive the old address information. Sometimes it can take up to 72 hours to propagate changes worldwide.

Use DNS Experience Tests

DNS Experience testing measures resolution time for a domain by running recursive DNS queries. Conducting these tests from servers on each level of a DNS route allows you to piece together a complete picture of how your DNS is resolved. You should use these tests to discover trends like: unusually high memory/CPU utilization or an increase in BIND query rates (DNS QPS). When internal zones are involved, you can also monitor the server’s disk I/O for simultaneous disk writes (indicating frequent zone transferring).

Monitor IP addresses

Monitor for mismatching of IPv4 A and IPV6 AAA records. This is done by comparing the IP address cached to the IP Address received.

Measure DNS Latency

Several factors can impact the performance of recursive servers: its actual load, available network capacity, TLD latency to root, cache misses, and authoritative name servers, among other things. You should measure latency from the user to the resolver as well.

Verify Connectivity

Latency issues and packet loss can cause DNS connectivity issues between name servers and domains, resulting in your website being unavailable to end users. These issues can be discovered as failures during connectivity testing.

Monitor DNS Servers

This would be only applicable to administrators managing all or part of their own DNS infrastructure. You can’t forget that the performance of DNS resolution also depends on the configuration of the hardware and software that host the DNS service. The workload on a DNS server cluster is measured in terms of QPS or Queries Per Second. It’s important to monitor not only the CPU and memory of the servers but also the disk I/O and network throughput under different levels of QPS load.

Complexities of DNS Monitoring

Because many solutions are cloud-hosted, picking the right synthetic monitoring tool to test your infrastructure can be tricky. For example, using an APM vendor like Dynatrace, NewRelic, AppDynamics, or Datadog would mean that you are using Amazon’s own datacenters to monitor your DNS, resulting in almost-zero latency and maximum availability. This is because your synthetic monitoring agent would not have to interact with the internet to test your infrastructure--it has a local connection through AWS. This does not reflect the behavior your users will experience.

{{banner-sre="/design/banners"}}

Conclusion

DNS Monitoring requires advanced tools that can effectively trace queries through a complex hierarchy of servers, network links, and services. This process involves continuously querying DNS records to measure their resolution latency and comparing those results to what is expected. The DNS is one of the most frequently attacked components, as it contains access to a wealth of sensitive information.

Choosing the right DNS monitoring tool is important to ensure service availability, security, and application performance. Interested in learning more? Check out our must-have feature list for synthetic monitoring tools.

Chapters