Leveraging AWS PrivateLink for volumetric data processing
AWS PrivateLink provides private, secure connectivity between VPCs, AWS services, and your on-premises networks, without exposing your traffic to the public internet. AWS PrivateLink offers three primary benefits to customers. First, it provides a way for two parties to establish private connectivity without requiring an Internet Gateway (IGW), thereby helping both parties to deploy airtight VPCs that are insulated from threat vectors on the internet. Second, customers can establish private connectivity between VPCs with overlapping CIDR blocks. Third, you can connect to AWS services and SaaS applications from your VPC in a private, secure, and scalable manner. Since connections on PrivateLink can only be initiated by the service consumer, third-party applications cannot reach into your VPC.
In this blog, I will highlight how you can leverage AWS PrivateLink for data-intensive use cases.
What’s New: Tiered pricing for AWS PrivateLink
We recently released a new pricing structure for AWS PrivateLink that is applicable for data-intensive, Petabyte scale use cases. As PrivateLink has grown in popularity, our customers have expressed a desire to use PrivateLink for many use cases that involve large volumes of data processing. Such examples include access to services such as Amazon Kinesis, Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (Amazon EBS), Amazon Simple Notification Service (Amazon SNS), Kafka workloads, connectivity between a SaaS provider and their customer etc. The new pricing uses a tiered pricing model with a 40% price reduction in PrivateLink data processing charges within an AWS region for usage between 1- 5 PB/month and a 60% price reduction in these charges above 5 PB/month. For example, if you use 15 PB/month across different interface endpoints, you would incur $.01/GB for the first 1 PB, $.006/GB for 1-5 PB and $.004/GB for 5-15 PB. If your enterprise is using PrivateLink to access multiple AWS services and endpoint services, the aggregated data processed across all interface endpoints within a region, is considered in your AWS bill.
Data-intensive use cases
There are at least 4 categories of use cases where large data volumes need to be securely transferred over a private connection:
- Data transfer between a SaaS service and a customer VPC (cloud monitoring as a service, data analytics etc.)
- S3 access from on-premises networks (data backups, data warehousing, data lakes etc.)
- High-volume AWS services (Amazon Kinesis, Amazon EBS, Amazon MSK etc.)
- Network traffic across VPC boundaries (internal enterprise apps, centralized traffic monitoring etc.)
I will elaborate on each of these use cases in the rest of this blog.
Data transfer between a SaaS service and a customer VPC
Many independent software vendors (ISVs) have adopted a “SaaS first” strategy. As a customer, you benefit from the simplicity of deploying a SaaS service without incurring the overhead of software maintenance. AWS PrivateLink offers a way to securely exchange data between the SaaS provider’s VPC and your Amazon VPC over the AWS network without the need to deploy an internet gateway, even if the CIDR blocks of the two VPCs overlap(see Figure 1 below). This facility of private data exchange can also be extended to on-premises networks when PrivateLink is combined with Direct Connect. Many SaaS providers also use multiple types of agents/collectors to provide an expanded set of services to their customers. You can use the full spectrum of a SaaS provider’s capabilities with this new pricing model.
Exchanging data streams with managed Kafka service providers is another example where you would send high volumes of data over PrivateLink if a private medium to exchange data is desired.
Figure 1: Data exchange between a SaaS provider and a SaaS customer over AWS PrivateLink
S3 access from on-premises networks
There are three ways to access Amazon S3 buckets—a public IP address, a gateway endpoint, and an interface endpoint (powered by AWS PrivateLink). While gateway endpoints are convenient for applications accessing S3 from within a VPC, interface endpoints are preferred when accessing S3 from on-premises networks. Your reasons for this preference could be an infosec policy that prohibits usage of a public virtual interface (VIF) or the overhead of managing proxies (when used with gateway endpoints). Many storage, backup, and analytics ISVs have integrated their offerings with Amazon S3. As a customer of such a service, you can securely upload your files and objects from your on-premises infrastructure into S3 using AWS PrivateLink without sending that data over a public network/subnet (see Figure 2 below). The ISV service offers storage, backup, analytics etc. by accessing this data in S3. The simplicity and data privacy of AWS PrivateLink enables you to leverage S3 for the adoption of such services and also for building a lake house architecture on AWS.
Figure 2: Accessing S3 over an interface endpoint (powered by AWS PrivateLink)
High-volume AWS services
AWS PrivateLink today supports over 116 AWS services, with that list growing every month. Some of these services are data intensive in usage. Examples of such services include Amazon Kinesis, Amazon Redshift, Amazon Rekognition, Amazon SQS, Amazon SNS, to name a few. In our discussions with customers, we have seen access to these services consume petabytes of data. To date, cost considerations sometimes compelled customers to access these services through an IGW.
With this new announcement, cost becomes a lesser concern and you can consider the best practice of deploying airtight VPCs without any public subnet. Rather than using the public endpoint to access these services through an internet gateway, you can set up an interface endpoint in a private subnet to establish private connectivity to such supported AWS services. Further, by enabling private DNS for that interface endpoint, you can make requests to the service using the service’s default DNS hostname, which will then resolve to the IP address of the interface endpoint. This results in no changes to the application and seamlessly switches any traffic to the AWS service over PrivateLink. You could also associate endpoint policies with interface endpoints to implement both access controls to the endpoint and data exfiltration controls.
As more of your network traffic is carried over interface endpoints, you should also review your overall architecture to check if a public subnet, public/elastic IP addresses for EC2 instances or NAT instances/NAT gateways in a VPC are still required. Reducing the traffic flowing through internet gateways would also decrease the volume of network traffic processed by associated perimeter security controls (e.g. network firewalls) in place. As such, moving to private subnets and using private IP addresses provides cost savings that could partially or fully offset the cost of interface endpoints used to access the same service over PrivateLink.
Network traffic across VPC boundaries
As your enterprise’s cloud footprint grows, the number of VPCs used across different accounts also grows. Internal application owners may wish to offer that application to users in other VPCs. You may also have situations where data needs to be shared across VPC boundaries. AWS offers several choices to solve this problem today—VPC peering at small scale and Transit Gateway as the preferred way to interconnect VPCs at scale. Both these networking services require CIDR blocks in the participating VPCs to be non-overlapping. If your use case involves connecting VPCs with overlapping CIDR blocks, you have two options to consider- private NAT gateway and PrivateLink. NAT gateway is recommended when the data flows between the participating VPCs are symmetric in nature. If your data flows between the participating VPCs are TCP-based and asymmetric (i.e. follow a “producer” – “consumer” relationship), AWS PrivateLink can enforce that asymmetric association, support overlapping CIDR blocks across the VPCs, and also offer a more economical option. PrivateLink allows connections to be initiated only from the consumer VPC. This lets the consumer trust the producer more, as the producer can’t access resources in the consumer VPC.
Scalability and Current Limits of PrivateLink
Each interface endpoint currently supports a bandwidth of up to 10Gbps per Availability Zone (AZ) and bursts of up to 40Gbps per AZ. For some of the data intensive use cases, it may be necessary to get a throughput higher than 10 Gbps per AZ. You can accomplish this by horizontally scaling interface endpoints.
For each interface endpoint, you can choose only one subnet per Availability Zone. There is also a soft limit on the number of interface endpoints and gateway load balancer endpoints per VPC (default: 50). For a full list of interface endpoint properties and limits, refer to the documentation.
Voice of our Partners
“Since Elastic Cloud introduced support for PrivateLink in August 2020, we have seen rapid adoption of customers using PrivateLink to connect securely to their deployments. Our customers care most about security, but cost is an important consideration as well. The new PrivateLink pricing model enables more of our customers to meet their security and compliance requirements while running their critical search, security and observability workloads on Elastic Cloud.”
Shubha Anjur Tupil, Senior Product Manager, Elastic
“MongoDB Atlas, the most innovative cloud database service on the market, deploys MongoDB in AWS regions around the world. Many of our customers use AWS PrivateLink to guarantee private connectivity between Atlas and all their AWS applications, services, and PrivateLink-enabled services. We are delighted to see this change by AWS, as it makes it easier for our customers to confidently connect their AWS environments with MongoDB Atlas, using AWS and MongoDB together for the most data-intensive workloads ”.
Andrew Davidson, Vice President Cloud Products, MongoDB
“Snowflake Data Cloud eliminates data silos, allowing you to seamlessly unify, analyze, share, and even monetize your data. AWS PrivateLink for Amazon S3 greatly simplifies and enhances security for our customers by providing private connectivity to Snowflake environments while reducing operational overhead. We are excited about the new PrivateLink pricing change that encourages our customers to use PrivateLink as the de facto way to access both S3 and our service”.
Vikas Jain, Senior Product Manager, Snowflake
In this blog, I reviewed four different data-intensive use cases where you can use AWS PrivateLink to provide private, secure connectivity between VPCs, AWS services, and your on-premises networks, without exposing your traffic to the public internet. The new pricing structure for AWS PrivateLink is intended to encourage adoption of PrivateLink as a best practice even in use cases involving high data volumes. In doing so, you can deploy airtight VPCs to minimize the surface area of attacks.