How can I troubleshoot DNS resolution issues with my weighted routing policy in Route 53?

Last updated: 2020-06-24

I configured a weighted routing policy in Amazon Route 53. However, when I test the DNS resolution, I get unexpected results. How can I troubleshoot this?

Short description

Consider that you've created a TXT record with the name "weighted.awsexampledomain.com". The record has a Time to Live (TTL) of 300 seconds, and weights configured as follows:

Name Type TTL Values Weight Health check status
weighted.awsexampledomain.com. TXT 300 "Record with Weight 0" Weight=0 Health check associated
weighted.awsexampledomain.com. TXT 300 "Record with Weight 20" Weight=20 Health check associated
weighted.awsexampledomain.com. TXT 300 "Record with Weight 50" Weight=50 Health check associated
weighted.awsexampledomain.com. TXT 300 "Record with Weight 70" Weight=70 Health check associated

This configuration is referenced in the following examples.

Resolution

Test your weighted routing policy to identify the issue

Send multiple (over 10,000) queries to test your weighted routing policy. Test the DNS resolution from multiple locations or directly query the authoritative name servers to understand the policy. Use the following scripts to send multiple DNS queries for your domain name.

Send DNS queries using the recursive resolver:

#!/bin/bash
for i in {1..10000}
do
domain=$(dig <domain-name> <type> @RecursiveResolver_IP +short)
echo -e  "$domain" >> RecursiveResolver_results.txt
done

Send DNS queries directly to the authoritative name servers:

#!/bin/bash
for i in {1..10000}
do
domain=$(dig <domain-name> <type> @AuthoritativeNameserver_IP +short)
echo -e  "$domain" >> AuthoritativeNameServer_results.txt
done

Example output using the awk tool:

$ for i in {1..10000}; do domain=$(dig weighted.awsexampledomain.com. TXT @172.16.173.64 +short); echo -e  "$domain" >> RecursiveResolver_results.txt; done

$ awk ' " " ' RecursiveResolver_results.txt | sort | uniq -c
1344 "Record with Weight 20"
3780 "Record with Weight 50"
4876 "Record with Weight 70"

Use your test results to troubleshoot your specific issue

Issue: Endpoint resources of the weighted records aren't receiving the expected traffic ratio.

Route 53 sends traffic to a resource based on the weight that you assign to the record as a proportion of the total weight for all records. DNS responses are cached by intermediate DNS resolvers for the duration of the record TTL. Clients are directed to only specific endpoints for the duration due to the cached response.

For example, if you query against the caching DNS resolver 192.168.1.2:

$ for i in {1..10000}; do domain=$(dig weighted.awsexampledomain.com. TXT @192.168.1.2 +short); echo -e  "$domain" >> CachingResolver_results.txt; done

$ awk ' " " ' CachingResolver_results.txt | sort | uniq -c
3561 "Record with Weight 20"
1256 "Record with Weight 50"
5183 "Record with Weight 70"

Notice that the above results aren't as expected due to the cache at the recursive DNS resolver.

Issue: Some of my weighted records aren't being returned.

For example, when some health checks are failing:

Name Type TTL Values Weight Health check status
weighted.awsexampledomain.com. TXT 300 "Record with Weight 0" Weight=0 Health Check Success
weighted.awsexampledomain.com. TXT 300 "Record with Weight 20" Weight=20 Health Check Success
weighted.awsexampledomain.com. TXT 300 "Record with Weight 50" Weight=50 Health Check Fail
weighted.awsexampledomain.com. TXT 300 "Record with Weight 70" Weight=70 Health Check Success
$ for i in {1..10000}; do domain=$(dig weighted.awsexampledomain.com. TXT @192.168.1.2 +short); echo -e  "$domain" >> HealthCheck_results.txt; done

$ awk ' " " ' HealthCheck_results.txt | sort | uniq -c
3602 "Record with Weight 20"
6398 "Record with Weight 70"

Notice that the "Record with Weight 50" isn't being returned by Route 53 because its health check is failing.

Issue: All of my weighted records are unhealthy.

Even if none of the records in a group of records are healthy, Route 53 must still provide a response to the DNS queries. However, there's no basis for choosing one record over another. In this case, Route 53 considers all of the records in the group to be healthy. One record is selected based on the routing policy and the values that you specify for each record.

For example:

Name Type TTL Values Weight Health check status
weighted.awsexampledomain.com. TXT 300 "Record with Weight 0" Weight=0 Health Check Fail
weighted.awsexampledomain.com. TXT 300 "Record with Weight 20" Weight=20 Health Check Fail
weighted.awsexampledomain.com. TXT 300 "Record with Weight 50" Weight=50 Health Check Fail
weighted.awsexampledomain.com. TXT 300 "Record with Weight 70" Weight=70 Health Check Fail
$ for i in {1..10000}; do domain=$(dig weighted.awsexampledomain.com. TXT @205.251.194.16 +short); echo -e  "$domain" >> All_UnHealthy_results.txt; done

$ awk ' " " ' All_UnHealthy_results.txt | sort | uniq -c
1446 "Record with Weight 20"
3554 "Record with Weight 50"
5000 "Record with Weight 70"

Notice that Route 53 considered all records healthy (Fail Open). Route 53 responded to the DNS requests per the configured proportions. "Record with Weight 0" isn't returned because its weight is zero.

Note: If you set nonzero weights to some records and zero weights to others, health checks work the same as when all records have nonzero weights. There are a few exceptions:

  • Route 53 initially considers only the healthy nonzero weighted records, if any.
  • If all nonzero records are unhealthy, Route 53 considers the healthy zero weighted records.

If you set "Weight" equal for all the records in a group, traffic is routed to all healthy resources with equal probability. If you set "Weight" to zero for all records in a group, traffic is routed to all healthy resources with equal probability.