Connecting Networks with Overlapping IP Ranges

A common situation we see in customer networks is when there are resources with overlapping IP address ranges that must communicate with each other. Frequently this occurs when companies are acquired and have used the same private (RFC1918) address ranges. However, it can also occur when a service provider with a unique IP range must provide access to two different customers that each have the same IP range.

Network overlaps can also occur unintentionally. Some AWS services, such as Amazon SageMaker and AWS Cloud9, automatically reserve particular IP ranges. Furthermore, some third-party products, such as Docker, do the same thing. Make sure that you check the documentation of services and applications when building your VPCs in order to avoid conflicts with predefined IP addresses.

This post discusses some ways in which you can overcome this particular obstacle for IPv4-based networks. Customers that are using IPv6 aren’t expected to experience this problem given the size of the address space.

Note that the solution you choose will depend on how your applications communicate with each other. You may require full two-way connectivity between applications (that is, network sessions can be established by either side). In other situations, you may only need “outbound” connectivity – where sessions are established from one network to the other and not the other way around. These patterns will influence how you design your network to deal with the overlapping IP ranges.

Option 1: Renumber IP networks

This is always the first suggestion we make to customers. It won’t work in the service provider scenario above. However, if there’s an opportunity to renumber the networks, then it’s the best option. Although changing a network configuration isn’t easy, it avoids long term pains such as:

Increased network management costs: Most of the other solutions presented below require appliances or services which will have a charge attached to them. Renumbering a network isn’t free (after all, time and people cost money, too). But in the long-term it removes the ongoing cost of running the components required to connect overlapping networks together.
Increased complexity: Generally, connecting two or more networks that overlap together is difficult! In the long-term it may prove to be increasingly complex as the application landscape grows and changes or as additional networks are added.
Complex troubleshooting: When things go wrong, trying to figure out what’s happening; where it’s happening; and what to do about it, is complex enough without having to deal with overlapping IP addresses. This can all be confusing and mean that troubleshooting takes much longer than it otherwise could.
Compatibility issues: All of the following solutions utilize Network Address Translation (NAT) in some way. Some applications won’t work with NAT, and others will have limitations in how they can be used. You may not have applications today that don’t work with NAT but they could be deployed in your environment in the future. Renumbering completely avoids this problem.
Utilizing NAT also means additional management overhead: Because applications use overlapping IP addresses, firewall rules will be complex as you keep track of and update the original and NAT IP addresses that application use.

In general, we strongly recommend renumbering overlapping networks where possible as it is cheaper and easier in the long-term.

Option 2: AWS PrivateLink

In 2017 AWS launched PrivateLink. This is a Hyperplane-based service that makes it easy to publish an API or application endpoint between VPCs, including those that have overlapping IP address ranges. It’s also ideal for service providers who must deliver connectivity to multiple customers, and thus have no control over the remote IP address range. Furthermore, it provides the same benefit to customers with complex networks where IP addresses overlap. This is by far the simplest option presented here, as it requires no change to the underlying network address scheme.

In the following diagram, you can see an application that resides in the “Provider” VPC. It has a Network Load Balancer (NLB) attached to it, and by using PrivateLink we can share the NLB with multiple “Consumer” VPCs. Here, the consumer VPCs overlap with each other and with the provider – the worst-case scenario.

PrivateLink diagram showing overlapping IP address ranges

In each consumer VPC the PrivateLink endpoint appears as an Elastic Network Interface with a local IP address. In the provider VPC, connections from the consumer VPC appear to come from a local IP address within the producer VPC. The underlying Hyperplane service is performing a double-sided NAT operation in order to make PrivateLink work.

There are added security benefits:

When establishing the PrivateLink connection the provider must send the owner of the consumer VPC a request. Then, the owner must approve it – exactly the same way that VPC peering works. There’s no way for a provider to create a consumer-facing PrivateLink without approval.
Only configured TCP ports are allowed between the consumer and provider. This makes sure that the consumer only has access to specific resources in the provider VPC and nothing else.
There’s no way for the application in the provider VPC to establish a connection to the consumer VPC.

Finally, there is a scalability benefit – an application can be published by a provider to hundreds of consumer VPCs.

Redundancy comes built into PrivateLink in the form of the NLB. This delivers traffic to the back-end servers and consumer VPC configuration. Moreover, you choose which subnets to place endpoints in. The following diagram showing a multi-subnet environment which would be set up across multiple availability zones.

PrivateLink showing configuration for multiple availability zones

One common question from customers is how to achieve this connectivity with on-premises networks. In the following example, we have a provider VPC that’s connected to multiple independent consumers, who are in turn connected to AWS via VPN. Note that the consumers all have overlapping IP addresses, even with the provider VPC. The only challenge is to find an IP range that will be allocated to the VPC where the VPN service is attached that doesn’t overlap with the on-premises range. In this example, the on-premises clients will connect to an IP address allocated to the PrivateLink endpoint in the VPN VPC.

This solution also works with AWS Direct Connect as seen for Customer C in the diagram. Customer C also has a different IP range in the VPN VPC – perhaps because 172.16.0.0/16 was already in use in their network so that intermediate network must be different for them. This isn’t an issue, as the IP address range in that VPC only needs to not conflict with anything in the networks that Customer C uses. Therefore, there’s a huge range of flexibility in what can be chosen.

PrivateLink with on-premises connectivity via VPN and Direct Connect

Setting up this option is straightforward, as it has no additional maintenance, is highly redundant, and also highly scalable. Furthermore, it provides separation between the customer networks. If you’re creating applications in a service provider environment, then consider architecting your solution so that PrivateLink can deliver this level of network flexibility for you.

Note that there’s a cost for PrivateLink as per the pricing page. Some applications may not work with this solution as applications must present as a single TCP port. If you have an application that uses UDP or has multiple TCP ports and the clients must maintain back-end server affinity then PrivateLink isn’t appropriate for you.

Option 3: Use multiple IP address ranges in VPCs

You may have an application that’s broken into different tiers – a front-end that responds to users or other application requests; and then one or more “back-end” tiers comprising middleware, databases, caches, and so on. In this environment, you can choose to have a set of front-end subnets that have non-overlapping IP addresses while the back-end subnets do overlap with other applications.

The following diagram shows three application VPCs connected to AWS Transit Gateway. Note that the VPCs have overlapping IP address ranges but different front-end subnets are advertised to Transit Gateway so that they can each be reached by end users. This requires that automatic route propagation to Transit Gateway be disabled as not all of the subnets in each VPC should be advertised.

In this environment you would create each of the VPCs with an overlapping IP address range (10.0.0.0/20 in the diagram) and then add a second IP address range to each VPC that is non-overlapping. In the front-end subnets you can add routes to the other front-end subnets (or just use a default route) that has Transit Gateway as the target.

This doesn’t solve the challenge of how to administer servers that reside in the back-end subnets. One way of doing this is to place a bastion host in the front-end subnet of each VPC. This will let administrators reach the back-end subnets by using SSH or RDP to that intermediary host. You might also use AWS Systems Manager to run commands remotely on hosts or to create SSH tunnels to back-end hosts.

You will still want the back-end servers to download code from repositories, updates from appropriate servers, send application logs, and provide performance metrics. For this, you might use a combination of private endpoints for AWS services (such as Amazon CloudWatch and Amazon Simple Storage Service (Amazon S3)). If your servers need outbound access to non-AWS endpoints then a NAT or proxy service hosted in the front-end subnets will be required.

This option means that if you had to renumber just some of the overlapping networks, then you can do less work (by only changing the front-end subnets) while mitigating most of the risk (by not having to run complex NAT solutions to have applications and users communicate). However, there are additional costs – bastion hosts, NAT or proxy instances and private endpoints for AWS services. We also strongly encourage that this infrastructure be deployed and managed using automation in order to keep administration costs as low as possible.

Although this diagram shows the web server (or any other front-end component of the application) in the front-end subnet, you could easily deploy load balancers to that subnet and keep the Amazon Elastic Compute Cloud (Amazon EC2) components in another subnet using a non-reachable IP address range.

This option lets you to deploy back-end workload subnets that have thousands of IP addresses without worrying about whether those overlap with other applications. Furthermore, you can only use the minimum number of IP addresses for front-end subnets to make sure that the application is reachable from external (to the VPC) networks.

Finally, consider using IPv6 instead of IPv4 for the back-end subnets. When using IPv4 in this scenario the back-end subnets aren’t reachable (except as described above). Using IPv6 removes the necessity for the subnets to overlap at all and as you migrate to IPv6 the resources in those subnets are reachable without any other workarounds.

Option 4: Hide subnets using Private NAT Gateway

We recently (in 2021 as of when this was written) launched Private NAT Gateway. In the same way that NAT Gateway lets you “hide” an entire VPC network range from the Internet (making it appear to come from a single Elastic IP address), Private NAT Gateway lets you do that when connecting from a VPC to other private networks. Instead of using an Elastic IP address and an Internet Gateway, Private NAT Gateway uses the private IP address that it’s allocated from within your VPC as the address that the VPC is “hidden” behind.

This is useful in an environment where you want to connect from a VPC to your on-premises networks or other VPCs, but don’t want to connect directly to resources in the VPC. This is very similar to Option 2 presented above except that you don’t have to run a NAT or proxy instance to provide outbound connectivity from the VPC.

The following diagram illustrates how Private NAT Gateways work:

Private NAT Gateway usage diagram

Note that the VPC IP address range is 10.0.0.0/16 but two extra subnets have been added (10.31.10.0/24 and 10.31.11.0/24) which are outside of the original VPC IP address range. A Private NAT Gateway has been added in each availability zone (note that as with Internet-facing NAT Gateways only one is required but two are recommended for redundancy) to the each of the subnets with the secondary IP address ranges. The NAT Gateways will use an IP address from that subnet to translate IP addresses of the workloads from the back-end subnets.

In Transit Gateway, a route to the front-end subnets has been added so that return traffic can be sent back to the Private NAT Gateways. Within the VPCs, traffic from the back-end subnets will be routed to the Private NAT Gateways in much the same way that Internet-facing NAT Gateway route tables operate.

In this case, managing instances in the back-end subnets would need to be done using SSM or bastion hosts in the front-end subnets. If application deployment was automated then there would be no need for human management of those hosts. This is a far more desirable outcome.

As with the previous option this is a great way to conserve IP addresses while making sure that relevant and critical parts of the workload are still routable and thus available. You can find a detailed walkthrough on how to create this type of environment in a recent post.

Note that there’s a charge for using Private NAT Gateway as shown on the pricing page.

Conclusion

In this post we’ve shown several ways of dealing with overlapping IP networks. The following table shows a comparison between the options:

Option	Service Cost	Redundancy	Full Network Reachability	Solution Complexity	Maintenance Complexity	Ideal For
1: Renumber	Low	N/A	Yes	Low	Low	Everyone – recommended
2: PrivateLink	Medium	Yes	No	Low	Low	Service providers
3: Use multiple IP ranges	Medium	Yes	No	Medium	Medium	Container/large workloads
4: Private NAT Gateway	Medium	Possible	No	Medium	Medium	Container/large workloads

Remember that renumbering the networks that conflict is by far the best option (in terms of cost, complexity and visibility) in the long-term. For service or application providers that have no control over the networks to which they connect, PrivateLink is designed specifically to deal with that problem.

Brett Looney

Brett Looney is a Principal Solutions Architect based in Perth, Australia. He helps customers in Asia Pacific Oceania and globally adopt best practices in cloud networking.

Networking & Content Delivery