Networking & Content Delivery
VPC sharing: key considerations and best practices
Introduction
It has been over 2 years since we launched VPC sharing at re:Invent 2018. I previously wrote about this capability in a “VPC sharing: A new approach to multiple accounts and VPC management” blog post. That blog covers everything you need to know about where to start, the benefits of VPC sharing, and why we decided to build this feature. At re:Invent 2019, I copresented with Atlassian on VPC sharing. The talk covered a number of production use cases and lessons learned from adopting VPC sharing at scale. In this blog post, I share key considerations, architecture lessons, and best practices that come from AWS customers using VPC sharing. Before we go into details on these lessons, I’d like to share a few stories.
VPC sharing in action
“VPC sharing, along with AWS Transit Gateway, has been pivotal to the micro AWS account approach at FactSet. Without these, we would need to continuously manage IP space provisioned for each AWS account, configure interconnectivity between 100s of VPCs, and determine routing to/from our Data Center for each VPC. Furthermore, this would need to be repeated for every AWS Region that FactSet wants to operate in. At times, there are some gaps in the shared VPC approach, but we have found that AWS is committed to improving the experience and treating shared VPCs as first class citizens.” – Gaurav Jain. VP, Director, Cloud Platform at FactSet. |
|
“With the VPC sharing concept, we could create VPCs in one AWS account and share them across multiple accounts. Other accounts are able to build resources on these shared VPCs, but these resources are not visible outside that specific child account. The rate limit of their host AWS account applies to these resources as well. This solved our earlier issue of constantly hitting AWS rate limits due to having all our resources in one AWS account. This approach seemed really attractive to our Cloud Engineering team, as we could manage the IP space, build VPCs, and share them with our child account owners. Then, without having to worry about managing any of the overhead of setting up VPCs, route tables, or network access control lists, teams were able to use these VPCs and build their resources on top of them.” – Achintha Gunasekara, Senior Software Engineer at Slack. |
More details about the architecture and the migration approach chosen by Slack can be found by clicking on the blog post, “Building the Next Evolution of Cloud Networks at Slack.”
Shared subnets: one VPC participant per subnet, or many VPC participants per subnet
When you choose how to lay out your shared VPC, there are a few things to consider, including how to align subnets with VPC participants. Many people create a dedicated set of subnets per VPC participant, usually creating one subnet per Availability Zone (AZ). As a result, a VPC participant has at least three subnets in a three-AZ AWS Region. Having a set of dedicated subnets per VPC participant reduces the blast radius. A VPC participant cannot affect other VPC participants by exhausting all available IPs if they were to share subnets. You can allocate a set of dedicated subnets for VPC infrastructure needs. This approach is the best practice and allows the most flexibility as depicted in figure 1.
Figure 1. Dedicated subnets per VPC participant
If you are constrained by small IP allocations, you may opt to share subnets with many VPC participants at once. They accept the implication of blast radius and instead focus on risk mitigation. One example of this is monitoring of free IP exhaustion by periodically checking each subnet. This is a good to practice to follow, regardless of whether you are using a shared VPC or not—as IP exhaustion affects non-shared VPCs as well. In shared VPCs, subnet IP exhaustion and other governance checks could be executed centrally within the VPC owner account.
With non-production environments, sharing subnets with many VPC participants is also popular. An example of this is a sandbox environment where teams can experiment. While you typically want your non-production environments as close as possible to your production environment, the IP allocation efficiency makes this option appealing.
Figure 2. Sharing subnets with many VPC participants
Whether you are sharing subnets with one, or many, participants, I recommend having dedicated subnets for shared AWS infrastructure components such as VPC interface endpoints, firewall endpoints, and NAT gateways. These subnets are not shared and only used within the VPC owner AWS account.
Network segmentation and zoning in shared VPCs
As a best practice, you should have multiple environments to facilitate your application lifecycle management, such as: production, staging, development, and platform development. You typically want to isolate access between different environments and have separate shared VPCs per environment. Platform development environments are used by the VPC owner to test network and infrastructure level changes to the configuration. Other environments are often treated as production by the platform team that provides networking services. If you allow workloads to communicate between environments, services like AWS PrivateLink and AWS Transit Gateway come into play. For smaller scale environments, VPC peering is also suitable. For large environments, AWS Transit Gateway provides the benefit of flexible routing policies and the ability to insert a network appliance as a centralized, segmentation enforcement point. (See figure 3 for an example of different shared VPCs per environment connected together.)
Figure 3. Shared VPC per environment connected with AWS Transit Gateway
Many of the customers I work with operate a security zoning model. In this model, each security zone has specific requirements. As an example, an “external zone” might have routing to internet and contain applications accessed by external users. “Customer zone” might host customer data and have much tighter network controls, whereas “internal zone” is trusted and may have coarser grained networking controls in place. I’m not going to go into details of every possible option for a design here. Instead, I am focusing on the two I see most frequently:
- One of the most common models is to have multiple security zones within the same shared VPC. Since the VPC owner centrally controls routing and Network Access Control Lists (NACLs), it is possible to enforce strict security segmentation even within the same VPC. Keep in mind that the number of NACL rules is limited and rules should be coarse-grained. Also, customers who use this approach, frequently configure secondary VPC CIDRs with each CIDR representing a different security zone. This allows effective use of NACLs and Security Group configuration when traffic goes over AWS Transit Gateway to other VPCs or on-premises networks. Finally, for traffic flowing towards the internet, VPC owners can use AWS Network Firewall to inspect and filter traffic from VPC participants, as shown in figure 4. In the same figure, the NACL configuration example protects the “customer data zone” from the “external zone,” and any other internal network unless explicitly allowed.
Figure 4. Multiple security zones within a shared VPC
- Another approach is to have a dedicated shared VPC per a unique security zone and/or environment. This is often used by larger scale AWS customers and can reduce the blast radius further. It can also work together with centrally deployed network security appliances. With this approach, customers connect various shared VPCs with AWS Transit Gateway and insert an inspection appliance to have a central enforcement point for traffic that has to go from one security zone to another.
Figure 5. Single security zone per shared VPC
Sharing and permissions
VPC sharing uses AWS Resource Access Manager (RAM) to control which subnets are shared with which AWS accounts. When you enable RAM, sharing is enabled for the entire AWS Organization. You may have use cases where you want to disable RAM for specific AWS account or Organization Unit (OU). Service control policies (SCPs) are a type of AWS Organization policy you can use to manage permissions in your organization. For example, you can create an SCP to disable RAM and link to AWS accounts that are not supposed to share anything:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyAllSharing",
"Action": "ram:*",
"Effect": "Deny",
"Resource": "*"
}
]
}
For more complex policies and other examples of SCPs applicable to RAM, refer to our documentation here. You might also find this blog post, Control VPC sharing in an AWS multi-account setup with service control policies, helpful. It discusses the use of SCPs with VPC sharing in more detail.
There are two items to consider in this case. The first is deleting the default VPC, and second, to deny creation of new VPCs by VPC participants. I have spoken about these two items in the blog post, VPC sharing: A new approach to multiple accounts and VPC management.
Logging and Amazon GuardDuty
A VPC owner is responsible for enabling and monitoring most of the VPC-related logs. For example, a VPC owner could enable VPC Flow Logs at the VPC or subnet level. (Note: VPC participants can only enable VPC Flow Logs on ENIs they own.) There are also VPC DNS query logs. These let VPC owners log all queries made by resources within the VPC. The best practice is to have these enabled.
By default, any Amazon GuardDuty findings will only be available to the AWS account that owns the resource where the malicious activity was detected. For example, if there are findings against an EC2 instance owned by a VPC participant, then only that participant’s AWS account will see those findings. The VPC owner will not have access to findings related to participant resources. If findings need to be shared across AWS accounts, you can follow this standard process to enable it. If a VPC participant hasn’t turned on GuardDuty but the VPC owner has it running, then no findings will be generated against that VPC participant’s resources. Any findings against VPC owner resources will still be generated as usual and sent to the VPC owner account.
DNS
DNS resolution is controlled by the VPC owner. The VPC owner can set up Route 53 Resolver endpoints, forwarding rules, and the configuration used by VPC participants without any further configuration. If the VPC owner configures custom DNS resolvers in VPC DHCP settings, VPC participants will use the DHCP setting specified by the VPC owner.
Route 53 Private Hosted Zones (PHZ) work the same way as with any other VPC. PHZ associated with a shared VPC is resolvable by every VPC participant. VPC participants can also create a PHZ if they have a permission to do so in their own AWS account. For this, VPC participants must own a VPC. If a VPC participant is not allowed to create a PHZ or doesn’t have their own VPC, they can ask the VPC owner to help with this process. Once a PHZ has been created in the VPC participant’s AWS account, the VPC participant must share the PHZ with the VPC owner account. This is done outside of AWS RAM. Please refer to this documentation to understand how this works. The VPC owner can then associate the PHZ with the VPC enabling DNS name resolution for every VPC participant.
Hyperplane ENIs
AWS hyperplane, the Network Function Virtualization platform used for Network Load Balancer, Interface VPC endpoints, EFS, AWS Transit Gateway, VPC enabled Lambda, and NAT Gateway, creates special Hyperplane Elastic Network Interfaces (ENIs) in your VPC. The number of ENIs you can create per VPC has a quota for each of these services. These quotas apply to all VPCs (shared or non-shared). There are higher chances of hitting the quota with shared VPC as VPC participants can create their own resources consuming Hyperplane ENIs.
You can increase the number of Hyperplane ENIs available from the VPC owner account. Please provide the VPC ID to AWS Support when making the request. In some cases, you may require additional VPCs if they consume a high number of hyperplane ENIs. These VPCs must be connected using AWS Transit Gateway and not use VPC peering. Using VPC peering means that the Hyperplane ENI quotas are shared across all peered VPCs. If you’d like a more details, I have discussed this and other topics in a re:Invent session, Shared VPCs, lessons learned, and best practices.
Other Considerations
AWS Client VPN is a service that could be deployed into a shared VPC by a VPC participant. Since this is likely something that networking teams would prefer to control and deploy centrally, you may want to disable Client VPN with SCPs in your VPC participant AWS accounts (specific member accounts of AWS Organization).
VPC endpoints and VPC endpoint services require additional consideration. A VPC owner creates and manages VPC endpoints. The owner creates VPC gateway and interface endpoints for all participants to consume, configures DNS and resource level VPC endpoint policies. VPC participant AWS accounts cannot create VPC endpoints. In essence, VPC participants can’t create an endpoint to consume a service available outside of the VPC. However, participants can create Network Load Balancer and an endpoint service through AWS PrivateLink. This allows VPC participants to expose services running within the VPC to other AWS accounts. If you don’t have use cases where you are offering services via AWS PrivateLink, disable endpoint services creation via SCPs.
Conclusion
Many customers have successfully adopted VPC sharing and used it to simplify their AWS environment. There are a number of considerations related to security, DNS, Amazon VPC structure when using VPC sharing. The benefits of VPC sharing are valuable to AWS customers and following the best practices and lessons outlined in this post make it easier to design and operate shared VPCs. All in all, VPC sharing is a valuable tool in your AWS Cloud networking toolbox.