Optimizing NGINX load balancing on Amazon EC2 A1 instances

This post is contributed by Geoff Blake | Sr System Development Engineer

In a previous post, Optimizing Network Intensive Workloads on Amazon EC2 A1 Instances, I provided general guidance on tuning network-intensive workloads on A1 instances using Memcached as the example use case.

NGINX is another network-intensive application that, like Memcached, is a good fit for the A1 instances. This post describes how to configure NGINX as a load balancer on A1 for optimal performance using Amazon Linux 2, highlighting the important tuning parameters. You can extract up to a 30% performance benefit with these tunings over the default configuration on A1. However, depending on the particular data rates, processing required per request, instance size, and chosen AMI, the values in this post could change for your particular scenario. However, the methodologies described here are still applicable.

IRQ affinity and receive packet steering

Turning off irqbalance, pinning IRQs, and distributing network processing to specific cores helps the performance of NGINX when it runs on A1 instances.

Unlike Memcached in the previous post, NGINX does not benefit from using receive packet steering (RPS) to spread network processing among all cores or isolating processing to a subset of cores. It is better to allow NGINX access to all cores, while keeping IRQs and network processing isolated to a subset of cores.

Using a modified version of the script from the previous post, you can set IRQs, RPS, and NGINX workers to the following mappings. Finding the optimal balance of IRQs, RPS, and worker mappings can benefit performance by up to 10% on a1.4xlarge instances.

Instance type	IRQ settings	RPS settings	NGINX workers
a1.2xlarge	Core 0, 4	Core 0, 4	Run on cores 0-7
a1.4xlarge	Core 0, 8	Core 0, 8	Run on cores 0-15

NGINX access logging

For production deployments of NGINX, logging is critically important for monitoring the health of servers and debugging issues.

On large a1.4xlarge instance types, logging each request can become a performance bottleneck in certain situations. To alleviate this issue, tune the access_log configuration with the buffer modifier. Only a small amount of buffering is necessary to avoid logging becoming a bottleneck, on the order of 8 KB. This tuning parameter alone can give a significant boost of 20% or more on a1.4xlarge instances, depending on the traffic profile.

Additional Linux tuning parameters

The Linux networking stack is tuned to conserve resources for general use cases. When running NGINX as a load balancer, the server must be tuned to give the network stack considerably more resources than the default amount, for the prevention of dropped connections and packets. Here are the key parameters to tune:

core.somaxcon: Maximum number of backlogged sockets allowed. Increase this to 4096 or more to prevent dropping connection requests.
ipv4.tcp_max_syn_backlog: Maximum number of outstanding SYN requests. Set this to the same value as net.core.somaxconn.
ipv4.ip_local_port_range: To avoid prematurely running out of connections with clients, set to a larger ephemeral port range of 1024 65535.
core.rmem_max, net.core.wmem_max, net.ipv4.tcp_rmem, net.ipv4.tcp_wmem, net.ipv4.tcp_mem: Socket and TCP buffer settings. Tune these to be larger than the default. Setting the maximum buffer sizes to 8 MB should be sufficient.

Additional NGINX configuration parameters

To extract the most out of an NGINX load balancer on A1 instances, set the following NGINX parameters higher than their default values:

worker_processes: Keeping this set to the default of auto works well on A1.
worker_rlimit_nofile: Set this to a high value such as 65536 to allow many connections and access to files.
worker_connections: Set this to a high value such as 49152 to cover most of the ephemeral port range.
keepalive_requests: The number of requests that a downstream client can make before a connection is closed. Setting this to a reasonably high number such as 10000 helps prevent connection churn and ephemeral port exhaustion.
Keepalive: Set to a value that covers your total number of backends plus expected growth, such as 100 in your upstream blocks to keep connections open to your backends from each worker process.

Summary

Using the above tuning parameters versus the defaults that come with Amazon Linux 2 and NGINX can significantly increase the performance of an NGINX load balancing workload by up to 30% on the a1.4xlarge instance type. Similar, but less dramatic performance gains were seen on the smaller a1.2xlarge instance type as well. If you have questions about your own workload running on A1 instances, contact us at ec2-arm-dev-feedback@amazon.com.

AWS Compute Blog