Cybersecurity has always been the Achilles heel of the digital world. Digital security protocols, firewalls, and advanced authentication methods have improved and tightened internet security, but even with all these measures in place, cyberattacks are inevitable. You can only mitigate the impact and prevent any major compromise before it turns into a full-blown attack.
DDoS is one of the most common types of cyberattack. Enterprises including cloud providers go the extra mile to ensure security but the DDoS attack that happened on October 22 is proof that cybersecurity is not bulletproof.
Catchpoint noticed DNS failures early on Oct 22; the first failure we detected was at 05:20am PDT. We were able to identify a failure pattern quickly, and saw that the issue was persistent in cities like San Francisco and intermittent in Boston, Chicago, and Dallas.
Further analysis showed that the issue was specific to AWS S3 and not the whole of AWS. AWS S3 customers noticed DNS latencies and failures. In the image below, we see DNS requests for two different hosts on AWS DNS. While www.amplitude.com worked, there were DNS failures for the second host https://com-amplitude-marketing.s3-us-west-2.amazonaws.com/
The waterfall graph below confirms there was an issue with the second host that left certain page components inaccessible.
Root Cause Analysis
We observed 50% packet loss once the DNS query hit the AWS network:
AWS confirmed that they were facing a DDoS attack. Malicious traffic was being handled by the DDoS mitigations they had in place, but this was impacting legitimate traffic as well, which caused the DNS failures and congestion due to the increase in traffic. Traceroutes indicated packets traversing through Neustar ASN, which is probably the DDoS mitigation services used by AWS.
Catchpoint detected the first failure at 5:20 am PDT. One of our top customers reached out to AWS support, but they were unaware of the incident and were unable to detect or diagnose the issue. Later, AWS confirmed there was an issue and their status page was updated almost 5 hours after Catchpoint first notified customers of the issue.
According to the AWS status page, the issue was resolved at 6:30 pm PDT.
Mean Time to Detect (MTTD) Matters
The DDoS attack brings the spotlight back on MTTD and how it restricts the performance impact on end-user experience. The faster you detect an issue, the quicker the Mean Time to Resolve (MTTR) will be.
MTTD is accurate only when you monitor all four pillars of digital experience monitoring – Reachability, Availability, Performance and Reliability. Monitoring each of these pillars will eliminate blind spots in the application delivery chain. For example, the primary URL of Amplitude was available but other hosts were unavailable. The hosts experiencing the DNS issue would have been overlooked if it were not for the tests monitoring availability and reachability, in addition to measuring performance.
Building a monitoring strategy around these four pillars will ensure complete visibility. You will not be left in the dark when an issue strikes.
DDoS attacks, like the one suffered by AWS, can wreak havoc unless you have the right measures in place to mitigate the impact, improve DDoS resiliency and secure the entire application delivery chain. To begin with, follow these best practices:
- Resilient architecture: Most application runs on the cloud so when picking the cloud provider, ensure the provider has measures in place to protect against DDoS attacks as well as necessary strategies in place to quickly detect and mitigate the impact on performance.
- Right alerting strategy: Keep track of the type of traffic you get, as this will make it easier to detect any malicious or unusual activity. Configuring alerts based on existing patterns will reduce false positives and help you identify legitimate issues quickly.
- Distributed infrastructure: Using different services to serve application will limit an attack to a specific component in the delivery chain. For example, hosting the database and the application on different infrastructures will separate the critical components from those that can be publicly accessed. This also makes it easier to troubleshoot.
The AWS attack is worrisome considering the big holiday sale season that’s just around the corner. It can be a cause for concern for major retailers prepping for the next month. Cloud providers cannot guarantee 100% uptime, so you need to ensure you have complete visibility into every critical component in the delivery chain.
For even more about the AWS DDoS attack as well as tips and best practices for mitigating the impact of an attack, check out our recent webinar Anatomy of a DDoS Attack today!