Networking & Content Delivery
Optimizing Amazon S3 data transfers over Direct Connect
In today’s data-driven landscape, the efficient transfer of large datasets to and from Amazon Simple Storage Service (Amazon S3) is a critical piece of an enterprise’s cloud strategy. Common business use cases that need frequent transferring of large data sets include cloud-based data lakes that depend on receiving data from various sources. These sources often reside on-premises. Amazon S3 can also serve as the starting point for your Generative AI journey. Generative AI applications need large data sets, and by transferring this data into Amazon S3 organizations can use the full suite of Amazon Web Services (AWS) artificial intelligence/machine learning (AI/ML) tools. Once a model is trained in AWS using this data, the model artifacts can also be stored in Amazon S3. Other use cases include backup and restore, archive, Internet-of-Things (IoT) data ingestion, and big data analytics.
There are three general patterns when transferring data to and from Amazon S3:
- For small to moderate amounts of data (< 100 GB) that need infrequent transfers, an AWS Site-to-Site VPN connection can be sufficient.
- For large amounts of data (< 10 TB) that need frequent transfers and a consistent connection with low latency, the AWS Direct Connect service is the best choice. Direct Connect bypasses the public internet and provides a secure and dedicated connection to AWS.
- For very large amounts of data ( > 10s of TBs) and infrequent transfers, the AWS Snow Family is the most cost effective and efficient method. Snow Family devices are physically shipped to you, and you load your data onto the device before shipping back to AWS.
This post details three network architectures for setting up connectivity for the pattern using AWS Direct Connect. These architectures differ in terms of the services used, the associated costs, and the level of complexity. Understanding these network design options and their tradeoffs is crucial for organizations to optimize their cloud storage operations.
AWS services
The following services are included in the architectures that are covered. The following are brief descriptions of each one. You can select the links to learn more about each service.
- Direct Connect is a secure and dedicated networking service that is used to connect an on-premises environment to AWS through a Direct Connect Location. There are two types of Direct Connect connections: dedicated and hosted. Dedicated connections can support multiple Virtual Interfaces (VIFs). Refer to the AWS Direct Connect quotas page for currently supported values. A VIF is an IEEE 802.1q VLAN provisioned on the Direct Connect circuit between the user’s on-premises router and the Direct Connect router. It is a logical interface built on top of a physical connection. Hosted connections can support one VIF and are provided through an AWS APN Partner.
- The Direct Connect Gateway allows Direct Connect users to connect multiple Virtual Private Clouds (VPCs) in the same or different AWS Regions to their Direct Connect connection. It can be associated directly with multiple VPC Virtual Private Gateways (VGWs) or Transit Gateways that are attached to VPCs.
- The Transit Gateway is a network transit hub that you can use to interconnect your VPCs and on-premises networks through a single gateway. It simplifies the network topology and configuration by allowing you to build a hub and spoke topology between your VPCs, data centers, and branch offices.
- The Virtual Private Gateway provides edge routing for a VPC through either a VPN or a Direct Connect connection.
- Interface endpoints allow you to establish a private connection between your VPC and other supported AWS services using the AWS network instead of the internet.
Network architectures
All three of the architectures that are covered use Direct Connect. If you have a dedicated connection, then you can configure a new VIF on the existing connection. If you have a hosted connection that only supports one VIF, then you must order an additional hosted connection to support an additional VIF.
If your plan is to have a landing zone with many VPCs and to provide access to AWS services and applications inside and outside of those VPCs, then a dedicated connection is recommended because it provides more flexibility with the network design. It is also recommended to have at least two connections for resiliency as covered in the AWS Direct Connect Resiliency Toolkit.
The Direct Connect charges include a port hours charge that is based on the type of connection and the capacity of the connection. There is also a charge based on how much data is transferred outbound from AWS to on-premises. Data that is transferred inbound from on-premises to AWS is free. For more details on pricing, refer to the Direct Connect pricing page.
Each architecture description includes a pricing estimate based on the following example scenario.
You have two 10 Gbps dedicated Direct Connect connections and you want to setup connectivity to Amazon S3 for data transfer from on-premises. You transfer an estimated 4 TB of data per month into Amazon S3. You estimate that you must retrieve 2 TB of data per month from Amazon S3 and transfer it back to on-premises.
The following calculations are based on the AWS Pricing Calculator that can be used to run your own calculations based on your architecture and specific use case. While all pricing shown in the following architecture examples is based on AWS Regions in the United States, pricing in other AWS Regions might differ. The AWS Pricing Calculator can be used to show pricing information for these other AWS Regions.
Architecture 1: Using a public VIF
This first architecture uses a Direct Connect public VIF. A public VIF can access all AWS public services using public IP addresses. When the Border Gateway Protocol (BGP) session is established, Amazon public prefixes are advertised over the public VIF to your devices. This introduces some complexity that needs to be considered in relation to connecting to the AWS public network. When connecting your network to other networks it is best practice to use a firewall to inspect and block unwanted traffic just like you would with an internet connection. You can configure routing policies for prefixes that are advertised over both the public VIF and the internet and use BGP communities to control how far your prefixes are propagated into the AWS network.
The following figure shows this architecture using two Direct Connect connections, each with a public VIF connecting to the AWS network.
This architecture is a good choice if you want to minimize the data transfer costs related to transferring data into Amazon S3. However, it needs additional configuration for the public VIF, as outlined previously, because it exposes the on-premises network to the AWS public network, thus you need to take this into consideration.
By using our example scenario, the costs associated with this architecture include the Direct Connect charges as shown in the following table.
Direct Connect charges | |
Number of Direct Connect locations: | 2 |
Ports in use per location: | 2 |
Port type: | Dedicated |
Port capacity: | 10 Gbps |
Port hour rate: | $2.25 USD per hour |
Hours connected: | 730 hours* |
Total port hour charges: | $6,570.00 USD per month |
Data transfer out (DTO) charges | |
AWS Region sending data | US-EAST-1 (N. Virginia) |
Data transferred out | 2 TB per month |
Data transfer out rate | $0.02 USD per month |
Total Data Transfer out charges | $40.96 USD per month |
The total estimated charges would be $6,610.96 USD a month
Architecture 2: Using a Private VIF
This architecture uses a Private VIF and an interface endpoint to send data through a VPC to Amazon S3. A Private VIF is used to access a VPC using private addressing. An interface endpoint is accessible from on-premises over the Private VIF. The interface endpoint makes use of elastic network interfaces (ENIs) that are assigned private IP addresses from subnets in the VPC. When you create an interface endpoint it generates an endpoint-specific Amazon S3 DNS name that can be used to send traffic to the endpoint. You can also configure your on-premises DNS server to resolve the Amazon S3 DNS names to the private IP addresses of the Interface VPC endpoint for Amazon S3. Then, the traffic is sent to Amazon S3 over the AWS network.
The following figure shows this architecture using two Direct Connect connections, each with a Private VIF connecting through the Direct Connect Gateway to a VPC with an interface endpoint.
This architecture is a good choice if you do not want to create a public VIF. You may not want to add the additional configuration needed to filter the advertised Amazon prefixes. You may have security concerns with connecting to the AWS public network, especially if appropriate security services are not in-line with the Direct Connect connection. You may already have a hosted connection with a Private VIF that you prefer to use rather than ordering another hosted connection for another VIF.
The costs associated with this architecture include the Direct Connect charges covered in Architecture 1 in addition to the interface endpoint charges shown in the following table.
Direct Connect charges | $6,610.96 USD per month |
Interface endpoint charges | |
Number of VPC Endpoints | 2 |
Number of Availability Zones (AZs) | 2 |
Data processed | 4 TB inbound + 2 TB outbound |
Data processing charges (1 VPC + 4 ENIs) | $61.44 USD (6,144 GB * $0.01 USD) |
Port hour rate | $29.20 USD (4 ENIs 730 hours * $0.01 USD) |
Total interface endpoint charges | $90.64 USD per month |
The total estimated charges would be $6,610.96 USD + $90.64 USD = $6,701.60 USD per month
Architecture 3: Using a Transit VIF
This architecture uses a Transit VIF and the Transit Gateway. A single Transit VIF can be used to connect to all VPCs through the Transit Gateway. This avoids having to configure a separate Private VIF for each VPC, which was common practice before the Transit Gateway service became available. In this architecture a VPC with an interface endpoint is attached to the Transit Gateway, which is connected back to on-premises using a Transit VIF. Access to Amazon S3 through the interface endpoint would work as described in Architecture 2, with the exception being that traffic would flow over the Transit VIF to the interface endpoint.
The following figure shows this architecture using two Direct Connect connections, each with a Transit VIF connecting through the Direct Connect Gateway to the same Transit Gateway that connects to a VPC with an interface endpoint.
If you already have a Transit Gateway in place, then this architecture is the least complex from an implementation and operational standpoint and the most scalable. It is a good choice if you do not want to create additional VIFs or already have a hosted connection with a VIF and do not want to order another hosted connection for another VIF. It allows you to use a single Transit VIF to send data to VPCs and to Amazon S3 without configuring additional VIFs. It requires configuring the Transit Gateway and all of the attachments along with Transit Gateway routing tables.
The costs associated with this architecture include the Direct Connect charges and interface endpoint charges covered in Architectures 1 and 2, in addition to the Transit Gateway charges as shown in the following table.
Direct Connect charges | $6,610.96 USD per month |
VPC interface endpoint charges | $90.64 USD per month |
Transit Gateway charges | |
Number of Transit Gateway attachments | 2 (1 Direct Connect Gateway, 1 VPC) |
Data processed per attachment | 4 TB inbound + 2 TB outbound |
Data processing charges | $122.88 USD (6,144 GB * $0.02 USD) |
Transit Gateway attachment hourly charge | $36.50 USD ($0.05 USD per hour * 730 hours) |
Total Data Transfer out charges | $318.76 USD per month ($159.38 USD * 2) |
The total estimated charges would be $6,610.98 USD + $90.64 USD + $318.76 USD= $7,020.38 USD per month
Conclusion
This post discussed different network architectures for transferring large datasets between on-premises environments and Amazon S3 using AWS Direct Connect. By understanding these architecture options, such as public VIFs, Private VIFs through VPCs, and Transit VIFs through the Transit Gateway, in addition to the implications of using each service from a cost and configuration complexity standpoint, you can choose the right design for your organization. The choice is influenced by a number of things, such as AWS resources currently in use, security requirements, desire to minimize costs, and expected future expansion.
If you are a new user that is building a landing zone for the first time, then you can reference the Hybrid Network Connectivity Whitepaper to learn more about the considerations for choosing the right connectivity type and connectivity design.
An update was made on September 18, 2024: An earlier version of this post incorrectly stated that AWS IP address ranges from ip-ranges.json can be used for BGP prefix filtering. Architecture 3 has been updated to include multiple interface endpoints for a more resilient architecture. The amount of outbound traffic has been adjusted to reflect a more realistic example.
About the authors