AWS Site-to-Site VPN, choosing the right options to optimize performance

AWS Site-to-Site VPN is a fully-managed performant, scalable, secure, and highly-available way to connect your on-premises users and workloads to AWS. When using Site-to-Site VPN you can connect to both Amazon Virtual Private Clouds (Amazon VPCs) with two tunnels per connection for increased redundancy. For even greater performance with sites further from your AWS Region(s), you can enable AWS Global Accelerator on your VPNs to route traffic through the nearest AWS point of presence. Furthermore, you can use Site-to-Site VPN to connect from on-premises over AWS Direct Connect to your AWS Transit Gateway using Private IP VPN.

Site-to-Site VPN uses the Internet Protocol Security (IPsec) protocol to create encrypted tunnels. AWS has continued to innovate on behalf of our customers to make improvements in security, performance and functionality since Site-to-Site VPN first launched in August 2009. Some of these changes are unseen improvements in the underlying infrastructure and service. Yet others are in the direct control of our customers. Here, we’ll discuss a few configuration options that you can use to improve your performance with Site-to-Site VPN.

Baseline performance

If you’ve read the Site-to-Site VPN FAQ or quotas pages, then you may have seen a maximum performance of up to 1.25 gigabits per second (Gbps) and 140,000 packets per second (PPS) per tunnel. These are estimated maximums based on our experience, but the maximums are dependent on several factors. Certain cryptographic features on Site-to-Site VPN perform better than others. Additionally, certain network optimizations both inside and outside of the tunnel can also improve performance. This is also dependent on the specific application that is using the IPsec tunnel to communicate. Given this performance estimate is variable based on different factors, we highly recommend that you test baseline performance with your specific environment and workloads to understand what your maximum throughput will be.

Cryptographic optimizations

Modern cryptography provides three functions – authentication, data privacy, and the ability to determine the data integrity. To establish an IPsec tunnel, the Internet Key Exchange (IKE) protocol is used. IKE has two iterations: IKEv1 and IKEv2. We recommend using IKEv2, as it gives some key performance optimizations over IKEv1. IKEv2 allows several features over IKEv1, such as:

With IKEv1, traffic must belong to the same encryption domain in both directions (send and return) whereas in IKEv2 they can be encrypted separately.
IKEv2 offers faster speeds than IKEv1. IKEv2’s built-in support for NAT traversal makes going through firewalls and establishing a connection much faster.
IKEv2 reduces the number of Security Associations required per tunnel, thus reducing the required bandwidth as VPNs grow to include more and more tunnels between multiple nodes or gateways.
IKEv2 also allows asymmetric keys for authentication increasing the security posture.
To establish the tunnel, IKEv2 uses four messages while IKEv1 uses six in main mode and three in aggressive mode.

IKE is broken down into two phases: IKE phase 1 will establish an Internet Security Association and Key Management Protocol (ISAKMP) session to negotiate administrative parameters, called Security Associations (SA), exchange keys and pass keep-alive messages for the lifetime of the IPsec tunnel.

Hashing

During IKE phase 1, we negotiate which hashing algorithm is used to provide a measure to check message integrity and make sure that data isn’t manipulated in transit. AWS supports Secure Hashing Algorithm 1 (SHA-1) and SHA-2, and within SHA-2 we support 256-, 384-, and 512-bit digest sizes in commercial regions. Note that AWS GovCloud regions don’t support SHA-1. Although there are performance differences between SHA versions as well as digest size, we suggest that you select your SHA algorithm based on your customer gateway device support and security requirements. If you don’t have specific requirements, then we recommend using SHA-384 due to its performance and security characteristics.

Authentication

For authentication during ISAKMP negotiation, either a pre-shared key (PSK) or a certificate from Amazon Certificate Manager (ACM) is used. This choice won’t affect ongoing tunnel performance. However, we suggest using certificate-based authentication over PSK where possible to raise the security bar. Certificate-based authentication can occur from any customer gateway (CGW) IP address, while PSK adds the requirement to source from the IP address matching that in the CGW configuration.

Encryption

A Diffie-Hellman (DH) group determines how key material is generated for encryption. As with SHA, we recommend that you pick DH groups based on compatibility with your customer gateway device and your security requirements. If you don’t have specific requirements, then we recommend using DH group 20 due to its security characteristics.

Once the DH key exchange occurs, we use those keys to encrypt with the Advanced Encryption Standard (AES). AES is a symmetric block cipher, which means it will iteratively encrypt 128-bit blocks of data through a series of transformations. The differentiation in performance between AES-128 and AES-256 is negligible for optimized network loads, while AES-256 is substantially better from a security standpoint. AES-256 uses a 256-bit key, meaning there are 2²⁵⁶ variations as compared to 2¹²⁸ for AES-128. Additionally, AES-256 uses 14 rounds of encryption as compared to 10 with AES-128. Based on these facts, you should choose based on security requirements and compatibility on your customer gateway. However, we recommend AES-256. The main difference with AES performance is going to be between Cipher Block Chaining (CBC) and Galois/Counter Mode (GCM). CBC and GCM are chaining methods for calculating those transformations in AES. CBC must be calculated in a serial fashion on the processor. Therefore, it can’t utilize multiple cores. On the other hand, AES-GCM can encrypt and decrypt in parallel, allowing for higher throughput and thus greater performance. We support both AES128-GCM-16 and AES256-GCM-16. Once again, we suggest choosing between cipher suites based on your security requirements and compatibility, but we recommend AES256-GCM-16 where supported and within requirements.

Once IKE phase 1 negotiations complete, IKE phase 2 begins, which sets up the tunnel for data transfer. The parameters for IKE phase 2 can be configured separately, but we generally have the same advice as those for phase 1. Note that the IKE phase 2 key exchange happens inside the ISAKMP session and thus is encrypted by the keys that were exchanged in phase 1. The DH keys exchanged in phase 2 are used to support perfect forward secrecy (PFS). PFS is a way of protecting encrypted data from the compromise of keys by creating a new DH key for each session. This reduces the impact of a single key exposure.

Network optimizations

When sending data across multiple networking paths, there are multiple variables that can be adjusted to increase performance across the paths. Maximum Transmission Unit (MTU), TCP Congestion Control, and ECMP are variables that will be covered here. TCP traffic uses MTU and Maximum Segment Size (MSS) to define the size of the packet allowed to be sent efficiently across a network path between peers. MTU is the total size of the packet sent, while MSS is the size of only the data being sent to the peer. The TCP peers negotiate these two variables during the connection establishment of the TCP connection. With IPsec connections, traffic is encrypted and encapsulated before it is sent between the two peers. For this process to work within the confines of IP protocol, additional header information is added to the original packet size (see Figure 1).

Figure 1. Image that displays how the original packet MTU and MSS fit into the encapsulated packet MTU and MSS.

In general, most network paths are configured by default for an MTU size of 1500 bytes. With IP and TCP headers using 40 bytes of the packet, this makes the MSS 1460 bytes for the actual data packet. Sending the same traffic through an IPsec tunnel reduces the MSS size even further due to the encapsulation and encryption of the data to stay within the MTU. With various encryption algorithms available for IPsec, the optimal MTU and MSS sizes differ as shown in our documentation here.

When these values aren’t considered, fragmentation or packet drops can occur for the traffic flows utilizing the IPsec tunnel. Fragmentation will occur when an IPsec device must split or fragment the original data packet to accommodate the MTU (1500) with the additional IPsec packet information. Packet drops can occur when IPsec devices aren’t configured to reset “Don’t Fragment (DF)” flag on packets.

When adjusting packet sizes, you must also use optimal sizes for the entire path. Having packet sizes that are too small will increase data transfer times, as processing is done on each packet at each hop of the network path. To find optimal sizes, protocols such as Path MTU Discovery (PMTUD), TCP MTU Probing, or a simple ping command can be used.

Note that AWS VPNs currently don’t support PMTUD or jumbo frames.

TCP MTU Probing is a form of Packetization Layer Path MTU Discovery as outlined in IETF RFC 4821. It can be enabled in Linux by setting the net.ipv4.tcp_mtu_probing variable in the /proc/sys/net/ipv4/tcp_mtu_probing file as part of sysctl. It’s used in conjunction with the variable tcp_base_mss, and it starts with an MSS set to that value and increases the MSS until packet loss occurs at which point it backs off. You can also set tcp_mtu_probe_floor to change the default from 48 bytes, defining what the minimum MSS can be when TCP MTU probing is enabled. You should evaluate the performance impact of TCP probing in your test environment prior to making any changes to production workloads.

When using the ping command, make sure to use flags for Don’t Fragment (Windows -f | Linux/Mac -D) and Send Buffer Size (Windows -l <bytes> | Linux/Mac -s <bytes>). This will make sure that the ping packet sent won’t be fragmented, and then packet size can be scaled up or down to find the optimal packet size.

Failed ping test:

C:\>ping www.example.com -f -l 1500 Pinging 

www.example.com [192.0.2.1] with 1500 bytes of data: 
Packet needs to be fragmented but DF set. 
Packet needs to be fragmented but DF set. 
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.

Successful ping test:

C:\>ping www.example.com -f -l 1472

Pinging www.example.com [192.0.2.1] with 1472 bytes of data:
Reply from 192.0.2.1: bytes=1472 time=1ms TTL=54
Reply from 192.0.2.1: bytes=1472 time=1ms TTL=54
Reply from 192.0.2.1: bytes=1472 time=2ms TTL=54
Reply from 192.0.2.1: bytes=1472 time=2ms TTL=54

TCP congestion control

When network path optimizations are exhausted, customers may still run into issues related to network performance. Some of these issues may be due to the round-trip time (RTT) of the network path that the VPN connection must take. TCP congestion control is a property of TCP that prevents a single TCP session from overwhelming a path. It does this by adjusting a TCP window size, which is the number of packets that can be sent, awaiting a TCP ACK message from the receiver at any given time.

Most modern operating systems now use CUBIC as the default TCP congestion control algorithm. CUBIC is designed to increase performance on a cubic scale rather than a linear one (hence the name) to provide faster convergence of the TCP traffic. Although this generally performs well in most circumstances, when RTTs are higher than approximately 30ms (on high bandwidth network paths) it starts to trigger duplicate ACKs, which starts to trigger a decrease in Window max (Wmax) value with the CUBIC algorithm. Since the RTT doesn’t decrease due to the physical distance of the destination, CUBIC algorithm continues to reduce Wmax and cause the overall bandwidth of the traffic to slow well below the expected throughput.

This can be further verified with a simple ping test across the network path. If the network path shows very little to no loss (approximately 1 x 10-6), then changing the TCP congestion control algorithm should greatly increase the traffic bandwidth performance.

One such TCP congestion algorithm that has greatly increased bandwidth performance is Bottleneck Bandwidth and Round-trip propagation time (BBR). The BBR congestion algorithm performs at a higher performance when RTT is high between the source and the destination of traffic. This is because BBR doesn’t use packet-loss as a congestion trigger for traffic. Rather it uses RTT to maintain network performance. You should evaluate and benchmark performance in your test environment. The following command is used to change the TCP congestion control algorithm in Linux.

sysctl -w net.core.default_qdisc=fq

sysctl -w net.ipv4.tcp_congestion_control=bbr

To revert change back to CUBIC, use the following command:

sysctl -w net.ipv4.tcp_congestion_control=cubic

Note that Windows OS currently doesn’t support using BBR as a TCP congestion control algorithm. TCP congestion control algorithms are local to servers. Source and destination don’t have to match for traffic improvements.

Note that TCP data transmission is dependent on the congestion window that is maintained by the source server of the traffic. BBR should be used sparingly on high value or temporary workloads, as it consumes all available bandwidth.

TGW performance

Transit Gateway launched in November of 2018 and is now one of the most widely adopted networking services in AWS. Transit Gateway allows customers to create communication between networking resources at scale, allowing customers to create architecture that provides not only communication, but also segmentation and the ease of management of thousands of networking resources within the AWS infrastructure. This includes connections over Amazon VPNs, as it allows for tunnels to utilize equal cost multipath (ECMP).

ECMP allows customers to use Amazon VPN tunnels that are associated to a Transit Gateway or Cloud WAN core network in an active/active configuration. This allows customer to increase the overall aggregate bandwidth for multiple flows of traffic across multiple Amazon VPN tunnels. Therefore, with just a single Amazon VPN that has two tunnels, customer can push up to approximately 2.5 Gbps of total bandwidth. This can be increased by adding additional Amazon VPN connections. Note that on a per-flow basis traffic will still be limited to approximately 1.25 Gbps. You can read more about scaling bandwidth using Transit Gateway in this post.

Note that ECMP isn’t supported on VPN connections to Virtual Private Gateways (VGWs). To use ECMP, the VPN connection must be configured for dynamic routing with ECMP enabled on Transit Gateway or Cloud WAN.

Conclusion

In this post, we’ve covered cryptographic performance improvements that you can make by choosing IKEv2 with the suggested SHA algorithm, DH group, and Cipher suite, while also improving your security posture. We’ve also covered improvements that can be made by optimizing MTU and MSS configurations, changing the TCP congestion control algorithm, and using ECMP. Get started today by logging in to your test environment and try these settings out yourself, benchmarking the performance with the different configurations to find which ones work best for you.

Scott Morrison

Scott is a Senior Specialist Solutions Architect for Networking at AWS, where he helps customers design resilient and cost-effective networks. Scott loves to code in his spare working hours to solve unique problems. When not working, Scott is often found either in the desert outside of Las Vegas off-roading or occasionally playing in poker tournaments.

Shawji Varkey

Shawji is a Senior Specialist Technical Account Manager for Networking at AWS. He helps enterprise customers solve architectural and operational issues in their global cloud environments. In his spare time he enjoys hanging out and traveling with friends and family.

Networking & Content Delivery