How do I benchmark network throughput on an Amazon EC2 Windows instance?

Last updated: 2020-09-18

I need to measure the network bandwidth between Amazon Elastic Compute Cloud (Amazon EC2) Windows instances. How can I do that?

Resolution

Network performance benchmark testing can help you determine the Amazon EC2 instance types, sizes, and configuration that best suit your needs. For more information about network performance for each instance type, see Amazon EC2 instance types.

Launch and configure your Amazon EC2 Windows instances

Before you run benchmark tests, follow these steps:

1.    Launch two EC2 Windows instances to run network performance testing.

2.    Confirm that the instances support enhanced networking for Windows.

3.    To run network testing between instances that aren’t co-located in the same placement group or that don’t support jumbo frames, check and set the maximum transmission unit (MTU).

4.    Verify that you can connect to the instances.

Install the NTttcp network benchmark tool on both instances

Connect to each of the two Windows instances, and then follow these steps:

1.    Download NTttcp from the Microsoft TechNet website.

2.    Unzip the contents of the file to a folder.

3.    Open a command prompt with administrator privileges, and then change directories to the folder where you unzipped the NTttcp network benchmark tool.

4.    Before running NTttcp, change directories to the folder with the name matching the architecture of your EC2 Windows instance.

Test TCP and UDP network performance between the instances

NTttcp communicates over port 5001 by default when testing TCP and UDP performance. However, you can configure the port using the -p switch.

Important:

  • Security groups must be configured to allow communication over the ports that NTttcp uses.
  • Add inbound and outbound Windows Firewall rules on both the receiver and sender that allow NTttcp.exe connections.

Test TCP network performance

1.    Configure one instance as a receiver/server to initialize listeners, starting with the default port 5001. Or, specify an alternate initial listener port with the -p switch.

For example, the following command initializes a two-threaded receiver that listens on ports 80–81 of the specified IP address. The first thread runs on CPU 0, and the second thread runs on CPU 1.

ntttcp -r -p 80 -a 6 -t 60 -cd 5 -wu 5 -v -xml c:\bench.xml -m 1,0,192.168.1.4 1,1,192.168.1.4

The ntttcp.exe receiver parameters in the above example are described as follows:

  • -r: Receive.
  • -p 80: Port used by the first thread to receive data. The port number is incremented for each additional receiver thread.
  • -a 6: Asynchronous data transfer that posts 6 receive overlapped buffers per thread
  • -t 60: Test duration in seconds.
  • -cd 5: Test cooldown time of 5 seconds.
  • -wu 5: Test warmup time of 5 seconds.
  • -v: Specify verbose test output.
  • -xml: Save test output to the specified file (default saves to xml.txt).
  • -m: Specify three mapping parameters per session (# threads, CPUID, receiver IP address). Multiple sessions are space delimited.

2.    Configure the second instance as a sender/client, and then run a test against the receiver with your chosen parameters.

For example, the following command initializes a two-threaded TCP sender to ports 80-81 of the specified IP address. The first thread runs on CPU 0, and the second thread runs on CPU 1.

ntttcp -s -p 80 -a -t 60 -cd 5 -wu 5 -m 1,0,192.168.1.4 1,1,192.168.1.4

The ntttcp.exe sender parameters in the above example are described as follows:

  • -s: Send.
  • -p 80: Port used by the first thread to send data. This port number is incremented for each additional sender thread.
  • -a: The default value of asynchronous send overlapped buffers per thread is 2. Specify non-default value if needed.
  • -t 60: Test duration in seconds.
  • -cd 5: Test cooldown time of 5 seconds.
  • -wu 5: Test warmup time of 5 seconds.
  • -m: Specify three mapping parameters per session (# threads, CPUID, destination IP address). Multiple sessions are space delimited.

The XML output generated on the receiver should resemble the following. In this test, the total bandwidth used was about 9.02 GBps.

<ntttcpr computername="Win_EC2_Recv" version="5.31">
  <parameters>
    <send_socket_buff>0</send_socket_buff>
    <recv_socket_buff>-1</recv_socket_buff>
    <port>82</port>
    <sync_port>False</sync_port>
    <async>True</async>
    <verbose>True</verbose>
    <wsa>False</wsa>
    <use_ipv6>False</use_ipv6>
    <udp>False</udp>
    <verify_data>False</verify_data>
    <wait_all>False</wait_all>
    <run_time>60000</run_time>
    <warmup_time>5000</warmup_time>
    <cooldown_time>5000</cooldown_time>
    <dash_n_timeout>10800000</dash_n_timeout>
    <bind_sender>False</bind_sender>
    <sender_name></sender_name>
    <max_active_threads>2</max_active_threads>
  </parameters>
  <thread index="0">
    <realtime metric="s">60.012</realtime>
    <throughput metric="KB/s">542199.263</throughput>
    <throughput metric="MB/s">529.491</throughput>
    <throughput metric="mbps">4441.696</throughput>
    <avg_bytes_per_compl metric="B">65091.350</avg_bytes_per_compl>
  </thread>
  <thread index="1">
    <realtime metric="s">60.012</realtime>
    <throughput metric="KB/s">559260.669</throughput>
    <throughput metric="MB/s">546.153</throughput>
    <throughput metric="mbps">4581.463</throughput>
    <avg_bytes_per_compl metric="B">65535.750</avg_bytes_per_compl>
  </thread>
  <total_bytes metric="MB">64550.500000</total_bytes>
  <realtime metric="s">60.011000</realtime>
  <avg_bytes_per_compl metric="B">65316.236</avg_bytes_per_compl>
  <threads_avg_bytes_per_compl metric="B">65313.550</threads_avg_bytes_per_compl>
  <avg_frame_size metric="B">8194.809</avg_frame_size>
  <throughput metric="MB/s">1075.644</throughput>
  <throughput metric="mbps">9023.160</throughput>
  <total_buffers>1032808.000</total_buffers>
  <throughput metric="buffers/s">17210.311</throughput>
  <avg_packets_per_interrupt metric="packets/interrupt">5.749
    </avg_packets_per_interrupt>
  <interrupts metric="count/sec">23942.694</interrupts>
  <dpcs metric="count/sec">9546.816</dpcs>
  <avg_packets_per_dpc metric="packets/dpc">14.417
    </avg_packets_per_dpc>
  <cycles metric="cycles/byte">2.826</cycles>
  <packets_sent>730596</packets_sent>
  <packets_received>8259632</packets_received>
  <packets_retransmitted>0</packets_retransmitted>
  <errors>0</errors>
  <cpu metric="%">7.813</cpu>
  <bufferCount>9223372036854775807</bufferCount>
  <bufferLen>65536</bufferLen>
  <io>6</io>
</ntttcpr>

Test UDP network performance

1.    Configure one instance as a receiver/server to initialize listeners, starting with the default port 5001. Or, specify an alternate initial listener port with the -p switch.

For example, the following command initializes a two-threaded receiver that listens on ports 80–81 of the specified IP address. The first thread runs on CPU 0, and the second thread runs on CPU 1.

ntttcp –r –u -p 80 –t 60 –cd 5 –wu 5 –v –xml c:\\bench.xml –m 1,0,192.168.1.4 1,1,192.168.1.4

The ntttcp.exe receiver parameters in the above example are described as follows:

  • -r: Receive.
  • -u: Test UDP.
  • -p 80: Port used by first thread to receive data. The port number is incremented for each additional receiver thread.
  • -t 60: Test duration in seconds.
  • -cd 5: Test cooldown time of 5 seconds.
  • -wu 5: Test warmup time of 5 seconds.
  • -v: Specify verbose test output.
  • -xml: Save test output to the specified file (default saves to xml.txt).
  • -m: Specify three mapping parameters per session (# threads, CPUID, receiver IP address). Multiple sessions are space delimited.

2.    Configure a second instance as a sender/client, and then run a test against the receiver with the desired parameters.

For example, the following command initializes a two-threaded UDP sender to ports 80-81 of the specified IP address. The first thread runs on CPU 0, and the second thread runs on CPU 1.

ntttcp -s –u -p 80 -t 60 -cd 5 -wu 5 -m 1,0,192.168.1.4 1,1,192.168.1.4

The ntttcp.exe sender parameters in the above example are described as follows:

  • -s: Send.
  • -u: Test UDP (default is to test TCP).
  • -p 80: Port used by first thread to send data. The port number is incremented for each additional sender thread.
  • -t 60: Test duration in seconds.
  • -cd 5: Test cooldown time of 5 seconds.
  • -wu 5: Test warmup time of 5 seconds.
  • -m: Specify three mapping parameters per session (# threads, CPUID, destination IP address). Multiple sessions are space delimited.

The XML output generated on the receiver should resemble the following:

<ntttcpr computername="Win_UDP_Test" version="5.31">
  <parameters>
    <send_socket_buff>8192</send_socket_buff>
    <recv_socket_buff>-1</recv_socket_buff>
    <port>82</port>
    <sync_port>False</sync_port>
    <async>False</async>
    <verbose>True</verbose>
    <wsa>False</wsa>
    <use_ipv6>False</use_ipv6>
    <udp>True</udp>
    <verify_data>False</verify_data>
    <wait_all>False</wait_all>
    <run_time>60000</run_time>
    <warmup_time>5000</warmup_time>
    <cooldown_time>5000</cooldown_time>
    <dash_n_timeout>10800000</dash_n_timeout>
    <bind_sender>False</bind_sender>
    <sender_name></sender_name>
    <max_active_threads>2</max_active_threads>
  </parameters>
  <thread index="0">
    <realtime metric="s">60.016</realtime>
    <throughput metric="KB/s">6463.886</throughput>
    <throughput metric="MB/s">6.312</throughput>
    <throughput metric="mbps">52.952</throughput>
    <avg_bytes_per_compl metric="B">128.000</avg_bytes_per_compl>
  </thread>
  <thread index="1">
    <realtime metric="s">60.016</realtime>
    <throughput metric="KB/s">7712.922</throughput>
    <throughput metric="MB/s">7.532</throughput>
    <throughput metric="mbps">63.184</throughput>
    <avg_bytes_per_compl metric="B">128.000</avg_bytes_per_compl>
  </thread>
  <total_bytes metric="MB">830.880005</total_bytes>
  <realtime metric="s">60.015000</realtime>
  <avg_bytes_per_compl metric="B">128.000</avg_bytes_per_compl>
  <threads_avg_bytes_per_compl metric="B">128.000<</threads_avg_bytes_per_compl>
  <avg_frame_size metric="B">127.780</avg_frame_size>
  <throughput metric="MB/s">13.845</throughput>
  <throughput metric="mbps">116.136</throughput>
  <total_buffers>6806569.000</total_buffers>
  <throughput metric="buffers/s">113414.463</throughput>
  <avg_packets_per_interrupt metric="packets/interrupt">1.968
  </avg_packets_per_interrupt>
  <interrupts metric="count/sec">57715.621</interrupts>
  <dpcs metric="count/sec">11576.306</dpcs>
  <avg_packets_per_dpc metric="packets/dpc">9.814</avg_packets_per_dpc>
  <cycles metric="cycles/byte">210.673</cycles>
  <packets_sent>2</packets_sent>
  <packets_received>6818294</packets_received> 
  <packets_retransmitted>0</packets_retransmitted>
  <errors>1</errors>
  <cpu metric="%">44.976</cpu>
  <bufferCount>9223372036854775807</bufferCount>
  <bufferLen>128</bufferLen>
  <io>2</io>
</ntttcpr>
(Optional) NTttcp switches

To view all switches available for use with NTttcp, open a command prompt, and then run the following command:

ntttcp