UltraDNS Went Down and Took Netflix (and Half the Internet) With it
Neustar UltraDNS suffered a major DNS outage, causing Netflix and other clients to experience DNS failures as well.
Neustar UltraDNS experienced a major outage due to a suspected DDoS attack on Thursday, October 15, at approximately 4:20 PM EST and lasted for roughly 90 minutes. It was later discovered that the outage was the result of a technical malfunction rather than a DDoS attack; however, it still prohibited the company from providing services to its customers for a significant amount of time, causing some of its biggest clients, like Netflix, to experience DNS failures.
The outage affected the response times and availability of UltraDNS, causing Netflix to experience dramatic drops in availability throughout the duration of this outage as well, even interrupting its services to its 65.5 million subscribed users in several instances.
The ramifications expand at great lengths, affecting businesses at various degrees. Ensighten, a third party tag management company, was among the companies affected, which means some of their users experienced the impact of this outage as well via potential malfunctions with their third party tags.
The company announced that the outage was due to a technical malfunction, refuting initial reports of a DDoS attack, which didn’t seem like a far-fetched theory considering this wouldn’t have been the first this company has seen recently. Last year, they were hit with a 100 Gbps attack, resulting in latency issues for a large portion of their clients. UltraDNS manages over 14 billion daily DNS queries for clients such as AllState, Rackspace, Nike Store, Mercedes, Forever 21, BBC News, CNN Money, and ETrade.
According to Threatpost.com, large-scale DDoS attacks are occurring more frequently and, while the motivations are varied and often undetermined, attackers can use it as a disguise for other illegal behavior such as intellectual property theft and financial fraud.
Regardless of the cause of Thursday’s event, UltraDNS’s outage serves as a harsh reminder of the countless variables that reside within the complexities of DNS, and how difficult it can be to manage them. As IT professionals, we’re taught that redundancy is your network’s lifeline; however, the architecture of DNS makes it incredibly expensive to build a reliable backup strategy. The only way to completely mitigate a failure is to have your backup work in tandem with your primary service, and since this is simply impossible for most companies to afford, your SLA with your DNS provider is crucial to compensating for the revenue you lose when an outage does occur.
DNS providers typically have hundreds of servers in several points of presence (POPs) across the globe, so a micro-outage that is contained in a small geographic area may go undetected by the provider, but wreak havoc on your site’s performance. Therefore, deploying a synthetic monitoring solution that has the capability to discover an issue and alert you of the problem is crucial to the protection of your profits and the success of your business.
In the aftermath of an outage of this magnitude, we can’t help but think that perhaps it’s time to rethink the DNS specifications to better handle these types of global outages. After all, it only took 20 years for HTTP/2 to arrive—30 years is plenty of time for a DNS 2.0 to be born.