How can I troubleshoot DNS resolution issues with my Route 53 private hosted zone?

Last updated: 2021-05-24

How can I troubleshoot DNS resolution issues with my Amazon Route 53 private hosted zone?

Short description

To troubleshoot DNS resolution issues with a Route 53 private hosted zone:

  1. Confirm that DNS support is enabled in the virtual private cloud (VPC).
  2. Confirm that the correct VPC ID is associated with the private hosted zone.
  3. (For custom DNS or Active Directory servers) Confirm that forwarding rules are configured for private hosted zone domains in the custom DNS servers towards the Amazon-provided DNS server (CIDR+2).
  4. Review custom settings in resolv.conf.
  5. Confirm that private hosted zones don't have overlapping namespaces.
  6. Confirm that there's no zone delegation configured in the private hosted zone.
  7. Confirm that the resource record's routing policy is supported in private hosted zones.
  8. Confirm that the Resolver rule and its inbound endpoint resolve to different VPCs.
  9. Confirm that the appropriate query type is configured.
  10. (For queries from on-premises to Route 53 Resolver) Confirm that the on-premises resolver sends a recursive request.
  11. Confirm that the correct rule priorities are configured for the Amazon-provided DNS.

Resolution

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.

Confirm that DNS support is enabled in the VPC

To allow private hosted zone record resolution, DNS support must be enabled in your VPC. Be sure that DNSSupport and DNSHostnames are set to True in your VPC.

Confirm that the correct VPC ID is associated with the private hosted zone

When you associate a private hosted zone with a VPC, Route 53 Resolver creates an auto-defined rule and associates it with the VPC. Resources in that VPC can resolve DNS records in the private hosted zone by querying the Resolver.

Confirm that the correct VPC ID is associated with your private hosted zone. Also, be sure that you're querying the resource records of domain from within the same VPC.

To get a list of VPCs associated with a hosted zone, use the following command in the AWS CLI:

aws route53  list-hosted-zones-by-vpc --vpc-id <VPC-ID> --vpc-region <region-ID>

To get a list of private hosted zones associated with specific VPCs, use the following command in the AWS CLI:

aws route53 get-hosted-zone --id <id>

Confirm that forwarding rules are configured for private hosted zone domains in custom DNS servers towards the Amazon-provided DNS server (CIDR+2)

If you configured custom DNS servers or Active Directory (AD) servers in the DHCP options for DNS in your VPC:

  • In the forwarding rule, confirm that you configured servers to forward DNS queries for the private domain to the IP address of the Amazon-provided DNS servers of your VPC. For example, if the CIDR range for your VPC is 172.31.0.0/16, then the IP address of the VPC DNS server is 172.31.0.2 (the VPC network range plus two).
  • Confirm that the domain configured in the custom servers is different from your private hosted zone. If the domain is the same as your private hosted zone, the server is authoritative for that domain. The server won't contact the Amazon-provided DNS server for private hosted zone domains.

Review custom settings in resolv.conf

If you experience intermittent DNS resolution or responses, then review the configuration settings of your source instance resolv.conf.

For example, let's say that you configured the Rotate option in resolv.conf to load balance DNS queries between the Amazon-provided DNS server and the public Google resolver server (8.8.8.8). The resolv.conf settings are:

options rotate
; generated by /usr/sbin/dhclient-script
nameserver 8.8.8.8
nameserver 172.31.0.2

In your first query to the public Google resolver (8.8.8.8), you receive the expected NXdomain response. You receive this response because the resolver is trying to find the response in the public hosted zone instead of the private hosted zone.

Private hosted Zone Record - resolvconf.local
[ec2-user@ip-172-31-253-89 etc]$ curl -vks http://resolvconf.local
* Rebuilt URL to: http://resolvconf.local/
* Could not resolve host: resolvconf.local

15:24:58.553320 IP ip-172-31-253-89.ap-southeast-2.compute.internal.40043 > dns.google.domain: 65053+ A? resolvconf.local. (34)
15:24:58.554814 IP dns.google.domain > ip-172-31-253-89.ap-southeast-2.compute.internal.40043: 65053 NXDomain 0/1/0 (109)

However, the second query resolves successfully. The second query is successful because it reaches the VPC DNS resolver associated to your private hosted zone.

[ec2-user@ip-172-31-253-89 etc]$ curl -vks http://resolvconf.local
* Rebuilt URL to: http://resolvconf.local/
*   Trying 1.1.1.1...
* TCP_NODELAY set
* Connected to resolvconf.local (1.1.1.1) port 80 (#0)

15:25:00.224761 IP ip-172-31-253-89.ap-southeast-2.compute.internal.51578 > 172.31.0.2.domain: 7806+ A? resolvconf.local. (34)
15:25:00.226527 IP 172.31.0.2.domain > ip-172-31-253-89.ap-southeast-2.compute.internal.51578: 7806 1/0/0 A 1.1.1.1 (50)

Confirm that private hosted zones don't have overlapping namespaces

If there are multiple zones with overlapping namespaces (such as example.com and test.example.com), then the Resolver routes traffic to the hosted zone based on the most specific match. If there's a matching zone but no record that matches the domain name and type in the request, then Resolver doesn't forward the request to another zone or a public DNS resolver. Instead, Resolver returns NXDOMAIN (non-existent domain) to the client.

Confirm that you have the correct record configured in the most specific private hosted zone for successful DNS resolution.

For example, let's say that you have two private hosted zones with records configured as follows:

Private hosted zone name Record name Value
local overlap.privatevpc.local 60.1.1.1
privatevpc.local overlap.privatevpc.local 50.1.1.1

The request gets the following answer from the most specific matched private hosted zone:

[ec2-user@IAD-BAS-INSTANCE ~]$ dig overlap.privatevpc.local +short
50.1.1.1

Confirm that there's no zone delegation configured in the private hosted zone

Private hosted zones don't support zone delegation. If delegation is configured, the client gets the "Servfail" response code from the VPC resolver.

Use the AWS CLI to confirm that zone delegation isn't configured in the private hosted zone. For example:

Private hosted zone:abc.com Delegation NS record for: kc.abc.com Resource record: test.kc.abc.com

[ec2-user@ip-172-31-0-8 ~]$ dig test.kc.abc.com
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 63414
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;test.kc.abc.com        IN      A
;; Query time: 15 msec
;; SERVER: 172.31.0.2#53(172.31.0.2)
;; WHEN: Fri Apr 16 15:57:37 2021
;; MSG SIZE  rcvd: 48

Confirm that the resource record's routing policy is supported in private hosted zones

Confirm that you configured a routing policy in your resource record that's supported by a private hosted zone. The supported routing policies are:

  • Simple routing
  • Multivalue answer routing
  • Failover routing
  • Weighted routing

Confirm that the Resolver rule and its inbound endpoint resolve to different VPCs

When the outbound endpoint in a Resolver rule points to an inbound endpoint that shares a VPC with the rule, the result is a loop. In this loop, the query is continually passed between the inbound and outbound endpoints.

The forwarding rule can still be associated with other VPCs that are shared with other accounts using AWS Resource Access Manager (AWS RAM). Private hosted zones associated with the hub, or a central VPC, resolve from queries to inbound endpoints. A forwarding resolver rule won't change this resolution. For example:

Hub VPC: VPC A - CIDR 172.31.0.0/16 Spoke VPC: VPC B - CIDR 172.32.0.0/16 Inbound IP address: 172.31.253.100 and 172.31.2.100 Target IP addresses in forwarding rule: 172.31.253.100 and 172.31.2.100 Rule associated with VPCs: VPC A and VPC B Client: 172-32-254-37

ubuntu@ip-172-32-254-37:~$ dig overlap.privatevpc.local
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 9007
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; QUESTION SECTION:
;overlap.privatevpc.local. IN A
;; Query time: 2941 msec
;; SERVER: 172.32.0.2#53(172.32.0.2)

In the previous output, the DNS request is continuously hopping between the outbound and inbound endpoints. The request checks the rule associated with VPC A and sends the query back to the outbound endpoint. After several attempts, the query times out and responds with a "Servfail" response code.

To fix this issue and break the loop, remove the hub VPC association (VPC A) with the rule. Then, you get a successful response from the private hosted zone:

ubuntu@ip-172-32-254-37:~$ dig overlap.privatevpc.local
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58606
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;overlap.privatevpc.local. IN A
;; ANSWER SECTION:
overlap.privatevpc.local. 0 IN A 50.1.1.1
;; Query time: 5 msec
;; SERVER: 172.32.0.2#53(172.32.0.2)

Confirm that the on-premises resolver sends a recursive request

If you use queries from on-premises to Route 53 Resolver, then with the Resolver inbound endpoint, you can forward DNS queries from resolvers on your network to a VPC resolver. This action lets you resolve domain names for AWS resources, such as records in a private hosted zone.

In some cases, you might find that the private hosted zone isn't resolving successfully from the on-premises resolver. This behavior occurs because the on-premises resolver is sending an iterative query instead of a recursive request. The inbound endpoint is supporting recursive queries for successful DNS resolutions.

You can verify the resolution type using a packet capture on the DNS resolver (on-premises). Then, review the DNS flags (recursion desired = 0). You can also test the resolution by sending an iterative request using +norecurse with the dig command, or set "norecurse" with nslookup as follows:

Inbound endpoint IP address: 172.31.253.150 On-premises Resolver IP address: 10.0.4.210

Failed iterative query to the inbound IP address:

[ec2-user@IAD-BAS-INSTANCE ~]$ dig @172.31.253.150 overlap.privatevpc.local +norecurse
; <<>> DiG 9.11.0rc1 <<>> @172.31.253.150 overlap.privatevpc.local +norecurse
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Successful recursive query:

[ec2-user@IAD-BAS-INSTANCE ~]$ dig @172.31.253.150 overlap.privatevpc.local
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19051
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;overlap.privatevpc.local.      IN      A
;; ANSWER SECTION:
overlap.privatevpc.local. 0     IN      A       50.1.1.1
;; Query time: 200 msec
;; SERVER: 172.31.253.150#53(172.31.253.150)

Confirm that the correct rule priorities are configured for the Amazon-provided DNS

When the client instance sends a query to the resolver (AWS-provided DNS server), the resolver verifies the rules associated with the instance to determine where to route the request.

In general, the most specific rule takes priority. If there's a "test.example.com" Resolver rule and a "longest.test.example.com" private hosted zone, then lookup for the "longest.test.example.com" domain matches the private hosted zone.

If the rules are at the same domain level, the priority is:

  1. Resolver rule
  2. Private hosted zone rule
  3. Internal rule

For example, if there's a "test.example.com" Resolver rule and a "test.example.com" private hosted zone, then the Resolver rule takes priority. The query is forwarded to the servers or target IP addresses configured in the rule.