AWS Compute Blog

Optimizing Network Intensive Workloads on Amazon EC2 A1 Instances

This post courtesy of Ali Saidi, AWS, Principal Engineer

At re:Invent 2018, AWS announced the Amazon EC2 A1 instance. The A1 instances are powered by our internally developed Arm-based AWS Graviton processors and are up to 45% less expensive than other instance types with the same number of vCPUs and DRAM. These instances are based on the AWS Nitro System, and offer enhanced-networking of up to 10 Gbps with Elastic Network Adapters (ENA).

One of the use cases for the A1 instance is key-value stores and in this post, we describe how to get the most performance from the A1 instance running memcached. Some simple configuration options increase the performance of memcached by 3.9X over the out-of-the-box experience as we’ll show below. Although we focus on memcached, the configuration advice is similar for any network intensive workload running on A1 instances. Typically, the performance of network intensive workloads will improve by tuning some of these parameters, however depending on the particular data rates and processing requirements the values below could change.

irqbalance

Most Linux distributions enable irqbalance by default which load-balance interrupts to different CPUs during runtime. It does a good job to balance interrupt load, but in some cases, we can do better by pinning interrupts to specific CPUs. For our optimizations we’re going to temporarily disable irqbalance, however, if this is a production configuration that needs to survive a server reboot, irqbalance would need to be permanently disabled and the changes below would need to be added to the boot sequence.

Receive Packet Steering (RPS)

RPS controls which CPUs process packets are received by the Linux networking stack (softIRQs). Depending on instance size and the amount of application processing needed per packet, sometimes the optimal configuration is to have the core receiving packets also execute the Linux networking stack, other times it’s better to spread the processing among a set of cores. For memcached on EC2 A1 instances, we found that using RPS to spread the load out is helpful on the larger instance sizes.

Networking Queues

A1 instances with medium, large, and xlarge instance sizes have a single queue to send and receive packets while 2xlarge and 4xlarge instance sizes have two queues. On the single queue droplets, we’ll pin the IRQ to core 0, while on the dual-queue droplets we’ll use either core 0 or core 0 and core 8.

Instance Type IRQ settings RPS settings Application settings
a1.xlarge Core 0 Core 0 Run on cores 1-3
a1.2xlarge Both on core 0 Core 0-3, 4-7 Run on core 1-7
a1.4xlarge Core 0 and core 8 Core 0-7, 8-15 Run on cores 1-7 and 9-15

 

 

 

 

 

The following script sets up the Linux kernel parameters:

#!/bin/bash 

sudo systemctl stop irqbalance.service
set_irq_affinity() {
  grep eth0 /proc/interrupts | awk '{print $1}' | tr -d : | while read IRQ; 
do
    sudo sh -c "echo $1 > /proc/irq/$IRQ/smp_affinity_list"
    shift
  done
}
 
case `grep ^processor /proc/cpuinfo  | wc -l ` in
  (4) sudo sh -c 'echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0
      ;;
  (8) sudo sh -c 'echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      sudo sh -c 'echo f0 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0 0
      ;;
  (16) sudo sh -c 'echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      sudo sh -c 'echo ff00 > /sys/class/net/eth0/queues/rx-0/rps_cpus'
      set_irq_affinity 0 08
      ;;
  *)  echo "Script only supports 4, 8, 16 cores on A1 instances"
      exit 1;
      ;;
esac

Summary

Some simple tuning parameters can significantly improve the performance of network intensive workloads on the A1 instance. With these changes we get 3.9X the performance on an a1.4xlarge and the other two instance sizes see similar improvements. While the particular values listed here aren’t applicable to all network intensive benchmarks, this article demonstrates the methodology and provides a starting point to tune the system and balance the load across CPUs to improve performance. If you have questions about your own workload running on A1 instances, please don’t hesitate to get in touch with us at ec2-arm-dev-feedback@amazon.com .