How can I troubleshoot DNS resolution issues with my weighted routing policy in Route 53?
Last updated: 2020-06-24
I configured a weighted routing policy in Amazon Route 53. However, when I test the DNS resolution, I get unexpected results. How can I troubleshoot this?
Short description
Consider that you've created a TXT record with the name "weighted.awsexampledomain.com". The record has a Time to Live (TTL) of 300 seconds, and weights configured as follows:
Name | Type | TTL | Values | Weight | Health check status |
---|---|---|---|---|---|
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 0" | Weight=0 | Health check associated |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 20" | Weight=20 | Health check associated |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 50" | Weight=50 | Health check associated |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 70" | Weight=70 | Health check associated |
This configuration is referenced in the following examples.
Resolution
Test your weighted routing policy to identify the issue
Send multiple (over 10,000) queries to test your weighted routing policy. Test the DNS resolution from multiple locations or directly query the authoritative name servers to understand the policy. Use the following scripts to send multiple DNS queries for your domain name.
Send DNS queries using the recursive resolver:
#!/bin/bash
for i in {1..10000}
do
domain=$(dig <domain-name> <type> @RecursiveResolver_IP +short)
echo -e "$domain" >> RecursiveResolver_results.txt
done
Send DNS queries directly to the authoritative name servers:
#!/bin/bash
for i in {1..10000}
do
domain=$(dig <domain-name> <type> @AuthoritativeNameserver_IP +short)
echo -e "$domain" >> AuthoritativeNameServer_results.txt
done
Example output using the awk tool:
$ for i in {1..10000}; do domain=$(dig weighted.awsexampledomain.com. TXT @172.16.173.64 +short); echo -e "$domain" >> RecursiveResolver_results.txt; done
$ awk ' " " ' RecursiveResolver_results.txt | sort | uniq -c
1344 "Record with Weight 20"
3780 "Record with Weight 50"
4876 "Record with Weight 70"
Use your test results to troubleshoot your specific issue
Issue: Endpoint resources of the weighted records aren't receiving the expected traffic ratio.
Route 53 sends traffic to a resource based on the weight that you assign to the record as a proportion of the total weight for all records. DNS responses are cached by intermediate DNS resolvers for the duration of the record TTL. Clients are directed to only specific endpoints for the duration due to the cached response.
For example, if you query against the caching DNS resolver 192.168.1.2:
$ for i in {1..10000}; do domain=$(dig weighted.awsexampledomain.com. TXT @192.168.1.2 +short); echo -e "$domain" >> CachingResolver_results.txt; done
$ awk ' " " ' CachingResolver_results.txt | sort | uniq -c
3561 "Record with Weight 20"
1256 "Record with Weight 50"
5183 "Record with Weight 70"
Notice that the above results aren't as expected due to the cache at the recursive DNS resolver.
Issue: Some of my weighted records aren't being returned.
- If you associate health checks to a resource record set, Route 53 responds with the record only if the associated health check is successful. Be sure that the health check associated with your weighted record is successful. For more information, see How Amazon Route 53 determines whether a health check is healthy.
- If an RRSet in a policy doesn't have an attached health check, it's always considered healthy and is included in the possible responses to DNS queries. Records that are failing health checks aren't returned. Check the health check configuration and be sure that it's being reported as Healthy.
- If you're using "Evaluate Target Health" with the resource record set, Route 53 relies on the health check reported by the end resource. For more information, see Why is my alias record pointing to an Application Load Balancer marked as unhealthy when I’m using “Evaluate Target Health"?
For example, when some health checks are failing:
Name | Type | TTL | Values | Weight | Health check status |
---|---|---|---|---|---|
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 0" | Weight=0 | Health Check Success |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 20" | Weight=20 | Health Check Success |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 50" | Weight=50 | Health Check Fail |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 70" | Weight=70 | Health Check Success |
$ for i in {1..10000}; do domain=$(dig weighted.awsexampledomain.com. TXT @192.168.1.2 +short); echo -e "$domain" >> HealthCheck_results.txt; done
$ awk ' " " ' HealthCheck_results.txt | sort | uniq -c
3602 "Record with Weight 20"
6398 "Record with Weight 70"
Notice that the "Record with Weight 50" isn't being returned by Route 53 because its health check is failing.
Issue: All of my weighted records are unhealthy.
Even if none of the records in a group of records are healthy, Route 53 must still provide a response to the DNS queries. However, there's no basis for choosing one record over another. In this case, Route 53 considers all of the records in the group to be healthy. One record is selected based on the routing policy and the values that you specify for each record.
For example:
Name | Type | TTL | Values | Weight | Health check status |
---|---|---|---|---|---|
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 0" | Weight=0 | Health Check Fail |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 20" | Weight=20 | Health Check Fail |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 50" | Weight=50 | Health Check Fail |
weighted.awsexampledomain.com. | TXT | 300 | "Record with Weight 70" | Weight=70 | Health Check Fail |
$ for i in {1..10000}; do domain=$(dig weighted.awsexampledomain.com. TXT @205.251.194.16 +short); echo -e "$domain" >> All_UnHealthy_results.txt; done
$ awk ' " " ' All_UnHealthy_results.txt | sort | uniq -c
1446 "Record with Weight 20"
3554 "Record with Weight 50"
5000 "Record with Weight 70"
Notice that Route 53 considered all records healthy (Fail Open). Route 53 responded to the DNS requests per the configured proportions. "Record with Weight 0" isn't returned because its weight is zero.
Note: If you set nonzero weights to some records and zero weights to others, health checks work the same as when all records have nonzero weights. There are a few exceptions:
- Route 53 initially considers only the healthy nonzero weighted records, if any.
- If all nonzero records are unhealthy, Route 53 considers the healthy zero weighted records.
If you set "Weight" equal for all the records in a group, traffic is routed to all healthy resources with equal probability. If you set "Weight" to zero for all records in a group, traffic is routed to all healthy resources with equal probability.
Related information
Did this article help you?
Anything we could improve?
Need more help?