AWS Partner Network (APN) Blog
Reviewing DNS Mechanisms for Routing Traffic and Enabling Failover for AWS PrivateLink Deployments
By Anuj Dewangan, Solutions Architect at AWS
Customers looking to consume AWS PrivateLink-enabled services from service providers need a mechanism to route traffic from their Virtual Private Clouds (VPCs) and on-premises networks to PrivateLink VPC endpoints. The service providers, in turn, want to make it easy to consume their PrivateLink-enabled services by managing such routing mechanisms.
Many of our service provider partners operate public endpoints for their services and have additionally deployed PrivateLink-enabled services to offer high-performance, private connections to their customers.
These service providers require their services to continue to be highly available and resilient to any customer network interruptions, especially interruptions between customers’ on-premises networks and the PrivateLink VPC endpoints. In cases of such failures, they require a failover to the service’s public endpoints.
In this post, we describe four DNS mechanisms to route traffic from customer networks to the PrivateLink VPC endpoints. Additionally, for service providers who operate public endpoints, this post describes techniques for failover to the service’s public endpoints if there are network interruptions connecting to the private endpoints.
These failover mechanisms require enhancements to the client application installed on end user devices, so as a service provider, you need to own the client application functionality.
Web Applications and Public Endpoints
To establish a foundation for the mechanisms described later in this post, let’s start by reviewing an example of a client-server web application and its use of Internet DNS servers to discover and connect to public endpoints.
In Figure 1, Example Corp is a service provider that operates a public endpoint (shown as 203.0.113.12) for its services, owns the front-end client application installed on end user devices, and also manages the Internet DNS infrastructure to support the application.
The client application resolves the service’s public DNS hostname (example-api.com) to discover the IP address of the public endpoint. Example Corp’s Internet DNS servers resolve the DNS request for example-api.com to the public IP address 203.0.113.12 through an A record.
Figure 1 – Example Corp service’s public endpoint discovery.
This scenario is the normal flow of traffic between Example Corp’s client application and the public endpoints—in the absence of any private endpoints. Next, let’s look at our first scenario where Example Corp has deployed a PrivateLink-based VPC endpoint service to connect privately to their customers.
Routing Traffic with Split-Horizon DNS
As shown in Figure 2, in order to increase their service’s performance, Example Corp has additionally deployed a PrivateLink-based VPC endpoint service (vpce-svc-1234). One of Example Corp’s enterprise customers—AnyCompany—connects to Example Corp’s service privately using a PrivateLink VPC Endpoint.
AnyCompany has deployed two VPC endpoint Elastic Network Interfaces (ENIs) with IP addresses 10.24.34.10 and 10.24.35.11 in their VPC in two Availability Zones (AZs) to enable redundancy and higher throughput.
The Example Corp client applications are deployed in Amazon Elastic Compute Cloud (Amazon EC2) instances in AnyCompany’s VPC, and also in on-premises end user devices. To leverage the private connections, the client applications need to discover and send traffic to the PrivateLink VPC endpoint, instead of sending the traffic to the Internet endpoint at 203.0.113.12.
Figure 2 – Split-horizon DNS for AWS PrivateLink endpoints.
One of the conventional mechanisms to accomplish this is using split-horizon DNS, where DNS requests for the service’s public hostname from the customer VPC and on-premises networks resolve to the private IP addresses of the VPC endpoint ENIs, but requests from outside these networks still resolve to the public endpoints.
In our Figure 2 example, private DNS records are added at AnyCompany’s VPC DNS server and their on-premises DNS servers to resolve domain requests for the Example Corp service’s public hostname (example-api.com). These private DNS records are CNAME records for the service’s public hostname, targeting either the endpoint-specific zonal or regional DNS hostname of the VPC endpoint.
In Figure 2, a CNAME record for example-api.com targeting the regional DNS hostname anycompany-vpce-1234.amazonaws.com is added to the private DNS servers. The regional DNS hostname of anycompany-vpce-1234.amazonaws.com is an AWS-managed public DNS record specific to AnyCompany’s VPC endpoint, and it resolves to the IP addresses of the VPC endpoint ENIs, 10.24.34.10 and 10.24.35.11.
When Example Corp’s client applications send a DNS request to the service’s public hostname (example-api.com), the private DNS servers resolve the request to the IP addresses of the VPC endpoint ENIs, 10.24.34.10 and 10.24.35.11. Example Corp client applications thus discover and can now send traffic to the PrivateLink VPC endpoint.
Even though split-horizon DNS provides the simplest solution for traffic routing to the private endpoints and does not require any changes to the client application, it introduces challenges in fault recovery and administrative overhead for the customer.
Service Availability and DNS Administration Challenges
Let’s first consider a scenario of an infrastructure failure. In Figure 2, AnyCompany uses AWS DirectConnect to establish a connection between their Amazon VPC and on-premises network. If there’s a physical failure (connection or equipment) in this network, the on-premises network loses its connection to the VPC. Consequently, the client applications in the on-premises networks will not be able to reach the VPC endpoint, causing service unavailability.
A service provider like Example Corp, which continues to operate a public endpoint, can build resilience from infrastructure failures like this by having a mechanism to failover to the public endpoints in case the client application cannot connect to the private endpoints.
Similarly, client applications running on instances in public subnets in AnyCompany’s VPC may also be required to failover to public endpoints in case of connectivity failure to the VPC endpoint. However, it must be noted that PrivateLink VPC endpoints are highly fault tolerant themselves. Each PrivateLink VPC endpoint ENI is backed by Hyperplane nodes. The hyperplane nodes are deployed redundantly for fault tolerance within AZs. This makes each VPC endpoint ENI in an AZ highly fault tolerant.
Additionally, each PrivateLink VPC endpoint can have multiple ENIs deployed across different AZs, which makes the VPC endpoint highly available within a region. This lessens the need for a failover mechanism to public endpoints for access to the VPC endpoints from within the customer VPC.
Another consideration with split-horizon DNS is that the customer needs to manage private DNS records at DNS servers on-premises and in the VPC DNS servers, which creates administrative overhead and adds risk of configuration errors.
To reduce customer overhead, when a PrivateLink VPC endpoint is deployed for either AWS services or software-as-a-service (SaaS) services from AWS Marketplace, AWS automatically creates private DNS records for the service in the VPC DNS server of the customer.
This mechanism works well to route traffic to the VPC endpoint from within the customer VPC, but split-horizon private DNS records are still required at the on-premises DNS servers to ensure routing from on-premises client applications.
Just like AWS-manages private DNS records in the VPC DNS server for AWS services and AWS Marketplace services, many of our service provider partners want to manage the DNS records for their customers to enable traffic routing from both on-premises and customer VPC, to make it easy to deploy and consume their PrivateLink enabled services.
The next three DNS mechanisms provide techniques to alleviate the service availability and administration challenges identified here at varying degrees.
Private DNS Hostname for PrivateLink Endpoints
The example in Figure 3 describes how Private DNS hostname for PrivateLink endpoints mechanism is used to enable traffic routing to the private endpoints and additionally a failover mechanism to increase the availability of Example Corp’s service.
Figure 3 – Private DNS hostname for AWS PrivateLink Endpoints.
With Private DNS hostname for PrivateLink endpoints, the client application resolves one DNS hostname to discover IP address(es) of the public endpoint, and a second DNS hostname to discover the IP address(es) of the VPC endpoint ENIs. With knowledge of both public and private IP addresses, the client application can route traffic preferably to the private IP addresses and have a failover mechanism to the public IP addresses.
In our Figure 3 example, as a first enhancement, Example Corp’s client application is enhanced to resolve the two DNS hostnames—the public DNS hostname (example-api.com) to discover the public IP address of the service, and a second DNS hostname (example-api-pl.com)—to discover the presence of PrivateLink and the associated private IP addresses in the environment.
As a second enhancement to Example Corp’s client application, unreachability to the VPC endpoint private IP addresses triggers a failover to the public IP address.
Let’s walk through our example to understand how this works.
As shown in Figure 3, AnyCompany creates CNAME records in their private DNS servers for DNS requests to example-api-pl.com only. This CNAME record targets the endpoint regional DNS hostname (anycompany-vpce-1234.amazonaws.com).
AnyCompany does not create private DNS records for the public hostname (example-api.com) like was done in the split-horizon DNS mechanism. So, the DNS request for example-api.com is resolved by Example Corp’s Internet DNS infrastructure to the public IP address of 203.0.113.12.
Hence, Example Corp’s client application resolves example-api-pl.com to the private IP addresses of 10.24.34.10 and 10.24.35.11, and resolves example-api.com to the public IP address of 203.0.113.12.
If there’s a connectivity failure between AnyCompany’s on-premises network and their VPC, the client application on-premises cannot reach the private IP addresses of the VPC endpoint. This triggers the failover of connecting to the public endpoint at 203.0.113.12. Similarly, instances in AnyCompany’s VPC will resolve both the hostnames (using the VPC DNS server). In case of connectivity failure to the VPC endpoint, client applications on instances in public subnets will also failover to public endpoints.
Additionally, if there is no PrivateLink in the environment, Example Corp’s client application cannot resolve the PrivateLink-specific DNS hostname (example-api-pl.com), and thus only connects to the public service endpoint IP addresses. This ensures the enhancements to the client application apply to deployments with and without PrivateLink.
Private DNS hostname for PrivateLink endpoints provides the routing to the PrivateLink VPC endpoints, as well as a fallback mechanism from private endpoints to public endpoints. However, the burden of DNS record management for the PrivateLink-specific DNS hostname (example-api-pl.com) still resides with the customer. The next DNS mechanism allows service providers to manage the DNS records to support the customer’s PrivateLink deployment.
CIDR/ASN-Based DNS Routing
Figure 4 gives an example of how CIDR/ASN-based public DNS records are used for traffic routing and failover.
Figure 4 – CIDR-based DNS routing for AWS PrivateLink endpoints.
With CIDR/ASN-based DNS routing for PrivateLink endpoints, the client application resolves only the service’s public DNS hostname to discover both the IP addresses of the public endpoint and the VPC endpoint ENIs. The DNS request is resolved through CIDR/ASN-based public DNS records in Internet DNS servers.
Let’s walk through our example to understand the details.
In Figure 4, to support the deployment of PrivateLink VPC endpoints by AnyCompany, Example Corp creates a public DNS record in its Internet DNS server. This DNS record provides DNS resolution based on the source IP CIDR of the DNS requester (CIDR-based DNS records). In our example, AnyCompany’s on-premises network has an Internet routable IP CIDR of 192.0.2.0/24.
Let’s follow the resolution of the DNS request for example-api.com starting from the on-premises network.
When Example Corp’s client application sends a DNS request for example-api.com, because this request originated from 192.0.2.0/24, it matches the IP CIDR rule in the DNS record. The DNS record resolves to multiple IP addresses—the public IP address of the service at 203.0.113.12 and the customer-specific PrivateLink endpoint IP addresses of 10.24.34.10 and 10.24.35.11.
As a first enhancement to Example Corp’s client application, it needs to first identify and attempt to connect to private IP addresses from the DNS resolution. As a second enhancement, the client application fallbacks to public IP address if there is reachability failure, providing fault tolerance for Example Corp’s service.
Note that AnyCompany does not need to manage on-premises private DNS servers for this mechanism. However, for DNS requests for example-api.com from instances in AnyCompany’s VPC, private DNS records similar to the ones created in the split-horizon DNS mechanism are needed at the VPC DNS server. These requests cannot be resolved to the private IP addresses of the PrivateLink endpoint ENIs through public CIDR/ASN DNS records because DNS requests from instances in AnyCompany’s VPC will originate from AWS-owned public IP addresses, which are shared by our services and customers and are not dedicated to AnyCompany.
However, as DNS records for AWS services or AWS Marketplace services are automatically created by AWS in the customer’s VPC DNS servers, CIDR/ASN-based DNS routing mechanism can completely eliminate DNS records management for the customer when using AWS services or AWS Marketplace services.
Let’s look at our final DNS mechanism that provides traffic routing to the PrivateLink DNS endpoints as well as a failover mechanism to public endpoints from both on-premises and customer VPC. It also allows allow either the service provider or customer to manage all DNS records as public records, thus eliminating the need to manage private DNS records.
Customer-Specific DNS Records for AWS PrivateLink Endpoints
Figure 5 provides an example of how customer specific DNS records for PrivateLink endpoints mechanism works.
With customer-specific DNS records for PrivateLink endpoints, the client application resolves one DNS hostname to discover IP address(es) of the public endpoint and a second customer-specific DNS hostname to discover the IP address(es) of the VPC endpoint ENIs.
Both the DNS requests can be resolved through public records in Internet DNS servers, which can be managed by the service provider. Alternately, the customer-specific DNS hostname record can be managed as a public DNS record (or even as a private DNS record in private DNS servers) by the customer.
The example in Figure 5 describes a scenario where the service provider manages the record in their Internet DNS servers.
Figure 5 – Customer-specific DNS records for AWS PrivateLink endpoints.
Let’s walk through our example to understand the details.
As a first step, Example Corp’s client application is enhanced to resolve the two DNS hostnames—the public DNS hostname (example-api.com), and the second customer-specific DNS hostname (example-api-anycompany.com)—to discover the presence of PrivateLink in the environment.
Example Corp’s client application derives the customer-specific DNS hostname using the login domain of users at AnyCompany.
Let’s assume an enterprise user at AnyCompany has a corporate account with Example Corp’s service, and uses the username email@example.com. Based on the email domain of anycompany.com, the client application derives and resolves a globally unique customer-specific hostname- (example-api-anycompany.com).
The DNS record for the DNS hostname example-api-anycompany.com is managed by Example Corp in its Internet DNS servers. This record resolves to the customer-specific VPC endpoint ENI IP addresses of 10.24.34.10 and 10.24.35.11. Alternately, this record can also be managed by AnyCompany.
Once the client application resolves both these DNS hostnames, similar to what we saw in previous mechanisms, the client first connects to the private endpoints and then fails over to the public endpoint if there is reachability failure.
In this scenario, if Example Corp manages the customer-specific DNS hostname then it eliminates the need for AnyCompany to manage any DNS records either at their VPC or on-premises to support the PrivateLink connection.
Also, the client application discovers the customer domain using the login-domain from the username. This discovery of the customer domain by the client application is possible through several other mechanisms like through configuration, lookup of local domain at the host, or reverse DNS lookups.
The core idea behind this mechanism is the derivation of a globally unique customer-specific DNS hostname for resolving PrivateLink endpoint addresses.
The table in Figure 6 summarizes the four mechanisms described in this post, and can be used as a guide to choosing a suitable solution.
Figure 6 – Comparison table for the DNS mechanisms.
A Note on Automation for DNS Record Creation
In order to scale operations and minimize errors, both customers and service providers can leverage automation hooks available as part of AWS PrivateLink to deploy DNS records in support of PrivateLink VPC endpoint deployments.
For example, if a customer creates a connection request to connect to a service provider’s VPC endpoint service, you can configure your service to get a notification through Amazon Simple Notification Service (Amazon SNS).
You can then trigger an AWS Lambda function to verify if this is a valid request, accept/reject the request, and also perform infrastructure configuration like adding customer-specific CIDR-based DNS records to your Internet DNS servers in support of the PrivateLink deployment.
Each of the solutions described in this post, from split-horizon DNS to customer domain-based DNS records for AWS PrivateLink endpoints, enable varying degrees of traffic rerouting and high availability with ownership/management of the DNS infrastructure. Any implementation must consider a trade-off between service availability, changes to the client application, and management of DNS infrastructure.
We hope this has equipped you with ideas for experimentation and planning deployments with this powerful service.