Networking & Content Delivery
Introducing NLB TCP configurable idle timeout
This post guides you through configuring AWS Network Load Balancer (NLB) idle timeouts for Transmission Control Protocol (TCP) flows.
NLB is part of the Amazon Web Services (AWS) Elastic Load Balancing family, operating at Layer 4 of the Open Systems Interconnection (OSI) model. It manages client connections over TCP or User Datagram Protocol (UDP), distributing them across a set of load balancer targets.
NLB tracks a connection from its establishment until it’s closed or times out due to inactivity (idle timeout). By default, the idle timeout for TCP connections is 350 seconds, while UDP connections have a 120 second timeout.
With the new configurable idle timeout for TCP, you can now modify this attribute for existing and new NLBs, and determine how long NLB should wait before terminating an inactive connection.
Understanding TCP connection setup
Before diving in, we briefly review how the TCP protocol operates. For a deeper understanding, you can refer to the TCP RFC.
Figure 1. Stages of a TCP connection establishment
TCP connections go through several stages, such as establishment, data transfer, and graceful closure.
- Half open: The client sends a SYN, and the server responds, but the client doesn’t complete the handshake.
- Established: The three-way handshake is completed.
- Data transferred: After the handshake, data can be exchanged between the client and server. Note that this section of the diagram is clarified to make it easier to read.
- Closed: The client initiates the closure with a FIN packet, leading to a graceful shutdown.
NLB TCP connection handling
The NLB acts as a Layer 4 proxy, keeping track of each established connection in a flow table. Connections that are half-open, gracefully closed, or reset by the client or server are not tracked.
A single connection is defined by a 5-tuple, which includes the protocol (TCP), source IP address, source port, destination IP address, and destination port.
Figure 2. Sample architecture for NLB deployment
By default, if there’s no traffic between the client and the target for 350 seconds, then the connection is removed from the NLB flow table. If a client attempts to send traffic after the connection is no longer tracked, then NLB responds with a TCP RST, signaling that a new connection needs to be established.
For many applications, a connection timing out might be fine, but in some cases it can cause problems. For example, Internet-of-Things (IoT) devices that send data regularly may transfer only small amounts each time. Reopening a connection, especially an encrypted one, every time data is sent, can be resource-intensive and costly.
To prevent connections from timing out, you can set up TCP keepalives, which send a probe over an established connection at a predefined interval. Although this probe contains no data, it is enough to reset the idle timer on intermediary systems, such as the NLB. To learn more about setting up TCP keepalives, refer to our previous post.
If your application needs long-lasting, persistent TCP connections and you can’t use TCP keepalives, then you can modify the TCP idle timeout on the NLB.
Considerations when updating TCP idle timeout
You can adjust the TCP idle timeout for each NLB listener to any value between 60 and 6000 seconds. This change only affects new TCP connections, not the ones already in progress.
Before setting the idle timeout value, make sure that you understand your application’s needs and consider whether TCP keepalive could be an alternative. It’s best to set the NLB TCP idle timeout higher than your application’s TCP idle timeout. This means that your application handles connection management and timeouts, instead of the NLB.
Setting the idle timeout too high increases the risk of filling up of the flow table. If the table gets full, then it results in the NLB silently rejecting new connections. You should monitor rejected connections using the new Amazon CloudWatch metrics covered in the monitoring section. Seeing rejected connections would indicate that you should decrease the value for TCP idle timeout.
Steps to configure TCP idle timeout using AWS APIs/CLI
AWS is introducing new APIs with the launch of TCP idle timeout for NLB. The following examples show the APIs in action.
Describe the NLB listener to find out the current value for TCP idle timeout
Input:
Output:
Modify the value of the TCP idle timeout
Input:
Output:
Steps to configure TCP idle timeout using the AWS Management Console
The following steps show how to change the timeout value using the AWS Management Console .
1. Locate the NLB TCP listener.
Figure 3. NLB TCP listener
2. View the current TCP idle timeout value in the Attributes section.
Figure 4. NLB listener attributes
3. Enter the new TCP idle timeout value in the Edit listener attributes section.
Figure 5. Idle timeout setting.
Monitoring
The launch of NLB TCP idle timeout introduces two new metrics: RejectedFlowCount (total flows rejected due to the flow table being full) and RejectedFlowCount_TCP (TCP flows rejected for the same reason). These metrics help you monitor the impact of your idle timeout settings.
We recommend setting up CloudWatch alarms to notify you of when NLB starts rejecting flows. An increase in RejectedFlowCount indicates the need to decrease the timeout, allowing NLB to clear flows sooner and prevent the flow table from filling up.
Existing NLB metrics, such as NewFlowCount, NewFlowCount_TCP, ActiveFlowCount, and ActiveFlowCount_TCP, remain unchanged.
Conclusion
Configuring TCP idle timeouts in NLB offers greater control over connection management, especially for applications with long-lasting connections. By adjusting the idle timeout and monitoring the relevant metrics, you can optimize your NLB performance and prevent potential connection issues.
About the Authors