How can I determine whether my DNS queries to the Amazon-provided DNS server are failing due to VPC DNS throttling?

Last updated: 2022-03-31

My DNS queries to the Amazon-provided DNS server are failing or timing out. Are the DNS queries from my instance failing because of VPC DNS throttling?

Short description

Amazon-provided DNS servers enforce a limit of 1024 packets per second per elastic network interface. Amazon provided DNS servers reject any traffic exceeding this limit.

VPC Flow Logs don't capture the traffic your application sends to Amazon provided DNS servers. You can use packet captures or Traffic Mirroring to identify the cause of the DNS query failures.

Note: Amazon Route 53 query logging captures only the traffic that reaches the VPC.2 resolver (AmazonProvidedDNS). However, DNS queries are throttled at the elastic network interface level. So, throttled queries don't appear in the query logging.

Resolution

First, use one of the following methods to identify the source of DNS query failures. Then, if you determine that the cause is DNS throttling, use one of the following recommended fixes.

Determine the source of the DNS query failures

Option 1: Use tcpdump (Linux only)

1.    Use the following command to take rotating packet captures on your EC2 instance. The following command captures the initial 350 bytes of the packet and saves 20 files of 100 MB each while overwriting the old packet captures.

sudo tcpdump -i eth0 -s 350 -C 100 -W 20 -w /var/tmp/$(curl http://169.254.169.254/latest/meta-data/instance-id).$(date +%Y-%m-%d:%H:%M:%S).pcap

2.    Run the following Linux command to determine the number of DNS queries sent.

tcpdump  -r <file_name.pcap> -nn dst port 53 | awk -F " " '{ print $1 }' | cut -d"." -f1 | uniq -c

3.    If the number of DNS queries is greater than or equal to 1024 per second, any additional queries are throttled.

Option 2: Use Traffic Mirroring

If it's not feasible to take the tcpdump in your use case, then you can leverage Traffic Mirroring to identify whether DNS queries are throttled.

Note: Traffic Mirroring is available for Nitro-based instances and non-Nitro instance types. Traffic Mirroring charges apply.

First, capture traffic data:

1.    Complete the Traffic Mirroring prerequisites.

2.    Create a traffic mirror target. Confirm that the target elastic network interface or Network Load Balancer allows inbound traffic on port UDP 4789.

3.    Create a traffic mirror filter. For Filter settings, confirm that amazon-dns is enabled for Network services - optional.

4.    Create a traffic mirror session. After you configure Traffic Mirroring, mirrored traffic is gathered and stored on the traffic mirror target.

Note: Traffic Mirroring is a live stream of data. To capture the mirrored packets coming on the target and save in a pcap file, capture the traffic with UDP port 4789.

Then, analyze the captured data using Wireshark:

1.    Open the captured traffic in Wireshark.

2.    Choose the Statistics tab.

3.    Select the I/O Graph and clear all options.

4.    (For Linux only) Under Display Filter, add a filter using the VXLAN Network Identifier and DNS query flag. For example, if the VXLAN Network Identifier is 53 and the DNS query flag is 0x0100, the display filter for the graph is (vxlan.vni == 53) && (dns.flags == 0x0100).

5.    Review the graph to check whether it flatlines around 1024 (the Amazon-provided DNS server's packet per second limit). If the graph flatlines around this value, then DNS throttling is happening on the mirrored source.

Option 3: Elastic Network Adapter (ENA) driver network performance metric

If your EC2 instance is running one of the following ENA driver versions, you can review the real-time metrics for DNS throttle using the linklocal_allowance_exceeded metric:

  • Linux: 2.2.10 or later
  • Window: 2.2.2.0 or later

The linklocal_allowance_exceeded metric indicates the number of packets shaped and dropped due to PPS rate allowance exceeded for local services. Examples of local services are Amazon VPC DNS Service, Instance Metadata Service (IMDS), and Amazon Time Sync Service. You can verify this metric at multiple intervals to observe if the count is increasing. Because this metric is cumulative since the last restart of the driver (usually due to a stop and start or reboot of the instance), this metric is only significant if it's increasing.

To retrieve the metric linklocal_allowance_exceeded value, run the following command:

ethtool -S eth0

Correct DNS throttling issues

If you find that the cause of your DNS failures is DNS throttling, you can: