Measuring AWS Global Accelerator performance and analyzing results

On the AWS networking team, we’re often asked by customers who use AWS Global Accelerator to provide guidance around how to test and measure the network performance of their applications. To share this information more broadly, we decided to write this blog post. In this post, we discuss the factors that impact network performance and the tools that you can use to measure AWS Global Accelerator performance.

Before we get started, a brief overview of the primary service discussed in this blog. AWS Global Accelerator is a networking service that sends user traffic through the congestion free and fully managed AWS global network, which improves client-to-endpoint performance by up to 60%. TCP connections are terminated at the AWS Edge location closest to the users, instead of at the endpoint, accelerating data transfers globally. Once traffic reaches the AWS network, automated routing directs it to the most performant AWS origins in Regions and/or Availability Zones. For UDP workloads, the AWS network provides the global capacity needed to avoid packet loss and jitter during traffic spikes.

Factors that impact AWS Global Accelerator performance measurement

A common way to understand the end user experience in a production environment is to benchmark the networking service’s performance. To get the best insights, we recommend that you take performance measurements of actual production workloads, when possible. If you can’t use an actual production workload, because your application is not released yet or because you can’t use certain tools, we will provide guidance on the steps that you can take to improve test accuracy.

The following factors can influence performance results when you evaluate AWS Global Accelerator.

Performance measurement tools and methods: Depending on your use case, you might use a real user monitoring (RUM) tool, a synthetic monitoring tool, or conduct your own performance tests, as we discuss in the second part of this post. RUM-based testing uses code executed by an actual client (browser, app, media player, and so on) to measure different performance metrics, typically based on transactions that retrieve actual production content. Synthetic monitoring tools have nodes deployed in cities around the world. These tools use browser emulation or scripts to simulate the expected path end users take through an application endpoint. When possible, we recommend you use RUM instead of synthetic monitoring. RUM-base testing provides a better indication of the actual user experience and incorporates the “last mile” variances that are part of “eyeball networks” (those used primarily for browsing the internet or consuming content). Synthetic tests provide more stable results that you can use to compare one test to another, but they typically do not reflect the real user experience.

User proximity to AWS Regions: The farther your users are from an AWS Region, the more time user traffic spends traverse the public internet. This typically results in a higher network latency. Global Accelerator helps user traffic enter the congestion-free AWS global network closer to them, leading to a more consistent network latency. The AWS network is fully under the control of AWS, which optimizes for transmitting traffic efficiently across different Regions.

High availability/fault isolating network zones: Global Accelerator provides two static IP addresses for every accelerator, which it services from separate “network zones.” This optimizes for high availability at all times. Similar to AWS Availability Zones (AZs), network zones are isolated units with their own sets of physical infrastructure. Each network zone announces IP addresses to different client networks to improve fault tolerance. If network disruptions cause an IP address from one network zone to be unavailable, client applications can retry on the other static IP address, served from the other isolated network zone.

Each network zone has peering relationships with different Internet Service Providers (ISPs) to reduce the blast radius of network issues at an ISP. Because network zones use different ISPs and users can connect to different static IP addresses, there can be variations in the network path that user traffic takes to reach AWS edge locations. These different paths can lead to variabilities in performance between the two IP addresses. For example, users in France that use an accelerator to access an application hosted in eu-west-2 might observe that they enter the AWS network from different edge locations. You see these types of performance variations more often in countries that have a limited number of ISPs.

Proximity of users to Global Accelerator edge locations: Global Accelerator is designed to provide consistent performance for users that connect to applications from the public internet. If your testing uses Amazon EC2 instances to simulate user interaction with your applications, you won’t see a performance benefit. This is because the traffic between AWS Regions already traverses the AWS backbone by default. Note that using an accelerator still improves availability for cross-regional traffic, by providing instant failover, even without the benefit of improved performance.

Conducting your own performance tests

If you conduct synthetic tests yourself to measure network performance for your application with Global Accelerator, follow the guidelines in this section. We recommend testing by using several different tools. The test results will vary due to the differences in tool implemented and configuration.

To get the most accurate results for Global Accelerator, use the following best practices when you measure your production workload performance:

Measure performance from where your clients are physically located.
Gather and evaluate four measurements:
- Throughput – the amount of data or number of data packets that can be delivered in a predefined timeframe
- Latency in connection – also called round-trip times or RTT
- Network jitter – the variability over time of the network latency
- Packet loss – the failure of packets to reach their destination on the network
Capture at least 1,000 samples every hour for a day, to avoid a single data point from skewing the result. Over the course of a day, traffic peaks lead to public internet congestion and impact network performance. By taking a number of samples every hour throughout the day, you get a more complete picture of actual performance.

Important: Before you start measuring performance, make sure that your accelerator endpoints (EC2 instances, ALBs, NLBs, or EIPs) are prepared to handle the volume of connection requests that they might receive.

In the sections below, we walk you through measuring throughput, network jitter, and packet loss with examples. For simplicity, our examples use the following Global Accelerator setup:

A standard accelerator, with an EC2 instance endpoint in Sydney Region (ap-southeast-2)
Two listeners (TCP and UDP) that listen on ports 1 to 65535
Traffic goes from a client located in the US to the accelerator IP address, and then directly to the EC2 instance IP address

For more information about how to create and configure listeners, see Adding, editing, or removing a listener in AWS Global Accelerator documentation. Make sure that the Security Group associated with the EC2 instance allows connections on the ports for both TCP and UDP traffic.

Important: Do not use an EC2 instance as the client for your tests. The connection between EC2 instances already uses the AWS backbone, so you will not see any performance improvement.

Measuring throughput

Throughput measurements help identify congestion and packet loss experienced on a network. Throughput is a significant factor in application performance. In this section, we show you examples of using each of the following tools to measure throughput: AWS Global Accelerator Speed Comparison Tool, iPerf, or Apache Bench (ab). Depending on your use case, you can use any or all of the tools.

AWS Global Accelerator Speed Comparison Tool allows you to use your browser to compare Global Accelerator performance to the public internet performance from different AWS Regions. You select a file size to download for your test, and the tool downloads files to your browser over HTTPS/TCP from Application Load Balancers in different Regions. For each Region, you see a direct comparison of the download speeds between Global Accelerator and the public internet. Results can differ if you run the test multiple times because download times can vary based on factors that are external to Global Accelerator. These factors include the quality, capacity, and distance of the connection in your last-mile network.

The following is an example of running a Speed Comparison Tool test with a 100KB file download.

Global Accelerator Speed Comparison Tool

Our second example uses iPerf, a tool that takes active measurements of the maximum achievable bandwidth on IP networks. iPerf supports both TCP and UDP protocols, and allows you to send and receive network traffic to and from endpoints. There are two versions of iPerf: iPerf2 (multi-threaded) and iPerf3 (single-threaded). We use iPerf3 in this post. However, if you want to test parallel stream performance, we recommend that you use iPerf2.

To get started, install iPerf3 on both the accelerator endpoint (EC2 instance) and the client from where you are performing the tests. Run it as a server on the EC2 instance and as a client on the client. To learn more about installing and using iPerf, see https://iperf.fr/iperf-doc.php#3doc.

To run iPerf3 server on the EC2 instance, run the following command.

$ iperf3 --server --interval 20 [-p 5201]
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

By default, iPerf2 and iPerf3 listen on ports 5001 and 5201, respectively. You can use -p option to change the port number.

Next, run iPerf3 on the client where you will take measurements, specifying your accelerator IP address:

$ iperf3 --client <ACCELERATOR_IP> --interval 10 --time 300 [-p 5201]

You can make the following changes to customize the command:

Adjust --time (-t) for the load testing time in seconds (default: 10 seconds)
Use --interval (-i) to specify seconds between periodic bandwidth reports (default: 1)
Add -d option to measure throughput in both directions at the same time
Add --json for JSON output

We ran this client command for both the accelerator IP address and the EC2 endpoint IP address. The following shows the output for our example:

$ iperf3 -c <ACCELERATOR_IP> --time 300 --interval 10
Connecting to host <ACCELERATOR_IP>, port 5201
[  5] local 192.168.1.73 port 65216 connected to <ACCELERATOR_IP> port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  9.19 MBytes  7.71 Mbits/sec
[  5]  10.00-20.00  sec  9.58 MBytes  8.03 Mbits/sec
[  5]  20.00-30.00  sec  9.58 MBytes  8.04 Mbits/sec
[  5]  30.00-40.00  sec  9.42 MBytes  7.90 Mbits/sec
[  5]  40.00-50.00  sec  8.73 MBytes  7.32 Mbits/sec
[  5]  50.00-60.00  sec  7.03 MBytes  5.90 Mbits/sec
[  5]  60.00-70.00  sec  9.45 MBytes  7.93 Mbits/sec
[  5]  70.00-80.00  sec  9.56 MBytes  8.02 Mbits/sec
[  5]  80.00-90.00  sec  9.50 MBytes  7.97 Mbits/sec
[  5]  90.00-100.00 sec  9.55 MBytes  8.01 Mbits/sec
[  5] 100.00-110.00 sec  9.53 MBytes  7.99 Mbits/sec
[  5] 110.00-120.00 sec  9.04 MBytes  7.58 Mbits/sec
[  5] 120.00-130.00 sec  9.00 MBytes  7.55 Mbits/sec
[  5] 130.00-140.00 sec  7.86 MBytes  6.59 Mbits/sec
[  5] 140.00-150.00 sec  9.52 MBytes  7.99 Mbits/sec
[  5] 150.00-160.00 sec  9.55 MBytes  8.01 Mbits/sec
[  5] 160.00-170.00 sec  9.49 MBytes  7.96 Mbits/sec
[  5] 170.00-180.00 sec  9.32 MBytes  7.82 Mbits/sec
[  5] 180.00-190.00 sec  9.33 MBytes  7.83 Mbits/sec
[  5] 190.00-200.00 sec  9.53 MBytes  8.00 Mbits/sec
[  5] 200.00-210.00 sec  8.93 MBytes  7.49 Mbits/sec
[  5] 210.00-220.00 sec  9.38 MBytes  7.86 Mbits/sec
[  5] 220.00-230.00 sec  9.47 MBytes  7.94 Mbits/sec
[  5] 230.00-240.00 sec  9.05 MBytes  7.59 Mbits/sec
[  5] 240.00-250.00 sec  9.55 MBytes  8.01 Mbits/sec
[  5] 250.00-260.00 sec  9.58 MBytes  8.04 Mbits/sec
[  5] 260.00-270.00 sec  8.81 MBytes  7.39 Mbits/sec
[  5] 270.00-280.00 sec  8.44 MBytes  7.08 Mbits/sec
[  5] 280.00-290.00 sec  9.52 MBytes  7.98 Mbits/sec
[  5] 290.00-300.00 sec  8.97 MBytes  7.53 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-300.00 sec   275 MBytes  7.70 Mbits/sec                  sender
[  5]   0.00-300.00 sec   275 MBytes  7.70 Mbits/sec                  receiver

$ iperf3 -c <EC2_ENDPOINT_IP> --time 300 --interval 10
Connecting to host <EC2_ENDPOINT_IP>, port 5201
[  5] local 192.168.1.73 port 63841 connected to <EC2_ENDPOINT_IP> port 5201
ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  2.37 MBytes  1.99 Mbits/sec
[  5]  10.00-20.01  sec  4.39 MBytes  3.68 Mbits/sec
[  5]  20.01-30.00  sec  7.97 MBytes  6.69 Mbits/sec
[  5]  30.00-40.00  sec  7.60 MBytes  6.37 Mbits/sec
[  5]  40.00-50.00  sec  6.55 MBytes  5.49 Mbits/sec
[  5]  50.00-60.00  sec  2.41 MBytes  2.02 Mbits/sec
[  5]  60.00-70.00  sec  3.78 MBytes  3.18 Mbits/sec
[  5]  70.00-80.00  sec  9.15 MBytes  7.68 Mbits/sec
[  5]  80.00-90.00  sec  7.30 MBytes  6.13 Mbits/sec
[  5]  90.00-100.00 sec  7.01 MBytes  5.88 Mbits/sec
[  5] 100.00-110.00 sec  7.11 MBytes  5.96 Mbits/sec
[  5] 110.00-120.00 sec  7.61 MBytes  6.38 Mbits/sec
[  5] 120.00-130.00 sec  7.40 MBytes  6.21 Mbits/sec
[  5] 130.00-140.00 sec  6.03 MBytes  5.06 Mbits/sec
[  5] 140.00-150.00 sec  7.85 MBytes  6.59 Mbits/sec
[  5] 150.00-160.00 sec  7.61 MBytes  6.38 Mbits/sec
[  5] 160.00-170.00 sec  7.57 MBytes  6.35 Mbits/sec
[  5] 170.00-180.00 sec  7.45 MBytes  6.25 Mbits/sec
[  5] 180.00-190.00 sec  5.72 MBytes  4.80 Mbits/sec
[  5] 190.00-200.00 sec  6.60 MBytes  5.54 Mbits/sec
[  5] 200.00-210.00 sec  5.38 MBytes  4.51 Mbits/sec
[  5] 210.00-220.00 sec  9.31 MBytes  7.81 Mbits/sec
[  5] 220.00-230.00 sec  4.30 MBytes  3.61 Mbits/sec
[  5] 230.00-240.00 sec  5.77 MBytes  4.84 Mbits/sec
[  5] 240.00-250.00 sec  8.65 MBytes  7.25 Mbits/sec
[  5] 250.00-260.00 sec  6.54 MBytes  5.49 Mbits/sec
[  5] 260.00-270.00 sec  6.61 MBytes  5.54 Mbits/sec
[  5] 270.00-280.00 sec  3.64 MBytes  3.05 Mbits/sec
[  5] 280.00-290.00 sec  2.33 MBytes  1.95 Mbits/sec
[  5] 290.00-300.00 sec  2.23 MBytes  1.87 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-300.00 sec   184 MBytes  5.15 Mbits/sec                  sender
[  5]   0.00-300.00 sec   184 MBytes  5.15 Mbits/sec                  receiver

These tests show a consistent throughput with the accelerator endpoint. In addition, there’s a 49.5% improvement with the accelerator endpoint (7.7Mbps) compared to the EC2 endpoint (5.15Mbps). The following sums up the results:

Endpoint Byte transferred Avg bitrate Mean bitrate Min bitrate Max bitrate

Global Accelerator 275MB 7.7Mbps 7.94Mbps 5.9Mbps 8.04Mbps

EC2 instance 184MB 5.15Mbps 5.54Mbps 1.87Mbps 7.68Mbps

To complete the test, run the commands multiple times, at different times of the day, with different parameters, and then recorded the throughout measured. During our tests, the Global Accelerator endpoint showed consistent and better throughput (up to 60% improvement).

Endpoint	Byte transferred	Avg bitrate	Mean bitrate	Min bitrate	Max bitrate
Global Accelerator	275MB	7.7Mbps	7.94Mbps	5.9Mbps	8.04Mbps
EC2 instance	184MB	5.15Mbps	5.54Mbps	1.87Mbps	7.68Mbps

Finally, Apache Bench (ab) is another tool that you can use to measure throughput and latency. It is a load testing and benchmarking tool for HTTP servers. After you install the tool on the client, send a number of requests in parallel. The tool then returns a comprehensive report that includes the total time for the test, the average time per request, the transfer rate, the percentage of requests served within a certain time, the connection times for different stages of the HTTP request, and so on.

For example, the following command sends 1,000 (n option) requests for a file named testfile (size 100KB on the EC2 instance), with 10 (c option) requests in parallel:

$ ab -n 1000 -c 10 <IP_ADDRESS>/testfile

You can make the following changes to customize the command:

Adjust -n to increase or decrease the number of requests to perform for the benchmarking session. By default, the tool sends just a single request, which usually leads to non-representative benchmarking results.
Adjust -c to increase or decrease the number of multiple requests to perform at a time. The default is one request at a time.

For more information about Apache Bench, including sample commands and output, see the Apache Bench Quick Guide.

Our example test resulted in the following outputs for the accelerator IP address and the EC2 endpoint IP address:

$ ab -n 1000 -c 10 <ACCELERATOR_IP>/testfile
…
Server Software:        Apache/2.4.46
Server Hostname:        99.83.239.131
Server Port:            80

Document Path:          /testfile
Document Length:        102400 bytes

Concurrency Level:      10
Time taken for tests:   70.913 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      102648000 bytes
HTML transferred:       102400000 bytes
Requests per second:    14.10 [#/sec] (mean)
Time per request:       709.128 [ms] (mean)
Time per request:       70.913 [ms] (mean, across all concurrent requests)
Transfer rate:          1413.60 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       23   51  97.8     33    1157
Processing:   416  619  91.2    593    1347
Waiting:      206  359  28.5    358     552
Total:        587  670 119.0    633    1696

Percentage of the requests served within a certain time (ms)
  50%    633
  66%    658
  75%    681
  80%    700
  90%    775
  95%    888
  98%   1051
  99%   1219
 100%   1696 (longest request)

$ ab -n 1000 -c 10 <EC2_ENDPOINT_IP>/testfile
…
Server Software:        Apache/2.4.46
Server Hostname:        <EC2_ENDPOINT_IP>
Server Port:            80

Document Path:          /testfile
Document Length:        102400 bytes

Concurrency Level:      10
Time taken for tests:   190.208 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      102648000 bytes
HTML transferred:       102400000 bytes
Requests per second:    5.26 [#/sec] (mean)
Time per request:       1902.076 [ms] (mean)
Time per request:       190.208 [ms] (mean, across all concurrent requests)
Transfer rate:          527.01 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      190  224  75.9    204    1227
Processing:   616 1668 746.7   1499   10196
Waiting:      196  228  64.2    208     806
Total:        812 1892 768.3   1712   10408

Percentage of the requests served within a certain time (ms)
  50%   1712
  66%   1819
  75%   1943
  80%   2039
  90%   2314
  95%   2825
  98%   3554
  99%   5827
 100%  10408 (longest request)

Similar to iPerf tests, the Apache Bench tests show better and more consistent results when use accelerator endpoint compared to the EC2 endpoint. You can see the overall pattern by running the commands several times, at different times of the day, with different parameters. It can be helpful to summarize the tests in a table to compare outcomes. For example, the following summarizes our results, where RPS is requests per second and TPR is time per request:

Endpoint Time taken RPS TPR Transfer rate Connection time (ms) P50 (ms) P90 (ms) P95 (ms) P99 (ms)

Global Accelerator 70s 14.1 70.9ms 1413.6Kbps 670 (+/- sd 119) 633 775 888 1219

EC2 instance 190s 5.26 190.21ms 527.01Kbps 1892 (+/- sd 768.3) 1712 2314 2825 5827

Endpoint	Time taken	RPS	TPR	Transfer rate	Connection time (ms)	P50 (ms)	P90 (ms)	P95 (ms)	P99 (ms)
Global Accelerator	70s	14.1	70.9ms	1413.6Kbps	670 (+/- sd 119)	633	775	888	1219
EC2 instance	190s	5.26	190.21ms	527.01Kbps	1892 (+/- sd 768.3)	1712	2314	2825	5827

Measuring network jitter

Network jitter is the standard deviation of first byte latency (FBL) over a set time interval. To measure jitter, you must have real user latency data collected by using your own tools or third-party monitoring tools. You calculate the jitter by taking the standard deviation of the latency data. We recommend that you capture latency data over a long interval, then average those results over a 7 or 14-day period to get a more stable jitter performance number. Global Accelerator calculates standard deviation for FBL numbers over an hour, and then averages the standard deviations over 7 days. Higher jitter means that you have higher variation in your traffic latency; lower jitter means that you have lower variation in your traffic latency.

Analyzing performance results

If your workload is not performing as you expect it to, you can use the techniques in this section to evaluate the network path between the client and the nearest accelerator edge location.

First, you should measure the round-trip time (RTT) performance. RTT is the total time that it takes a data packet to travel from one point to another on the network and for a response to be sent back to the source. RTT is a key performance metric for measuring network latency. It is influenced by the distance between the source and the destination, the number of network hops, the network congestion, the server response time, and other factors.

The most common tool used to measure the round-trip time is Ping. It sends Internet Control Message Protocol (ICMP) echo request packets to the destination and then reports the time, in milliseconds, that it takes to receive a response signal (echo response). Ping provides reports that include the number of packets sent, the number of packets received and lost, and the minimum, maximum, and average round-trip time for each test packet.

The following is an example of a Ping command, where the parameter -c is the number of echo request packets to send to the IP address:

$ ping -c <NB_PACKETS> <IP_ADDRESS>

The following are sample Ping output results, for the accelerator IP address and the EC2 endpoint IP address:

$ ping -c 100 <ACCELERATOR_IP>
PING <ACCELERATOR_IP> (<ACCELERATOR_IP>): 56 data bytes
64 bytes from <ACCELERATOR_IP>: icmp_seq=0 ttl=121 time=24.471 ms
64 bytes from <ACCELERATOR_IP>: icmp_seq=1 ttl=121 time=23.569 ms
…
64 bytes from <ACCELERATOR_IP>: icmp_seq=3 ttl=121 time=24.039 ms
64 bytes from <ACCELERATOR_IP>: icmp_seq=4 ttl=121 time=25.119 ms

--- <ACCELERATOR_IP> ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 25.213/66.056/267.335/73.283 ms

$ ping -c 100 <EC2_ENDPOINT_IP>
PING <EC2_ENDPOINT_IP> (<EC2_ENDPOINT_IP>): 56 data bytes
64 bytes from <EC2_ENDPOINT_IP>: icmp_seq=0 ttl=230 time=379.595 ms
64 bytes from <EC2_ENDPOINT_IP>: icmp_seq=1 ttl=230 time=324.100 ms
…
64 bytes from <EC2_ENDPOINT_IP>: icmp_seq=98 ttl=230 time=242.460 ms
64 bytes from <EC2_ENDPOINT_IP>: icmp_seq=99 ttl=230 time=195.995 ms

--- <EC2_ENDPOINT_IP> ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 193.696/246.047/669.552/102.057 ms

The most important information here is the RTT (“time”) and packet loss. The RTTs show the time it took for a packet to reach the destination and come back. A packet is determined to be lost if the ICMP message was discarded on the way, or if a packet returns after the timeout value (two seconds by default). High RTTs and packet loss lead to poor network performance. As you can see in the summary line of the report for each ping (the summary starts with round-trip min/avg/max/stddev), the ping to the accelerator endpoint shows lower and consistent RTTs (average 66ms, standard deviation 73ms) than the one to the EC2 endpoint (average 246ms, standard deviation 102ms).

Next, if your Ping reports show long RTTs and high latencies, you can use the Traceroute tool to identify the route between the client and the accelerator endpoint, and see how long each hop takes. We recommend using TCP echo with Traceroute (rather than using ICMP echo), to bypass the most common firewall filters. Using Traceroute can also help you understand what’s causing internet latency between the client location and the nearest Global Accelerator edge location. To better understand the latency between an end user and the nearest Global Accelerator edge location. We recommend that you capture measurements every five minutes, over a one-hour period.

The following are example Traceroute commands and the output, using -p to specify the destination port to connect to:

$ traceroute -T -p <PORT> <IP_ADDRESS>

$ tcptraceroute -p <PORT> <IP_ADDRESS>

$ tcptraceroute -p 80 <ACCELERATOR_IP>
Selected device en0, address 192.168.1.73, port 80 for outgoing packets
Tracing the path to <ACCELERATOR_IP> on TCP port 80 (http), 30 hops max
 1  192.168.1.254  1.826 ms  1.753 ms  0.946 ms
 2  * * *
 3  * * *
 4  12.242.112.22  104.510 ms  24.491 ms  28.178 ms
 5  12.244.76.10  27.180 ms  24.546 ms  24.002 ms
 6  * * *
 7  a1b2c3d4e5f6g7h8i.awsglobalaccelerator.com (<ACCELERATOR_IP>) [open]  25.364 ms  25.151 ms  25.929 ms

$ tcptraceroute -p 80 <EC2_ENDPOINT_IP>
Selected device en0, address 192.168.1.73, port 64380 for outgoing packets
Tracing the path to <EC2_ENDPOINT_IP> on TCP port 80 (http), 30 hops max
 1  192.168.1.254  3.750 ms  3.756 ms  2.895 ms
 2  * * *
 3  * * *
 4  12.123.240.150  67.266 ms  64.464 ms  63.719 ms
 5  dlstx22crs.ip.att.net (12.122.1.210)  62.043 ms  69.184 ms  62.157 ms
 6  la2ca22crs.ip.att.net (12.122.2.82)  63.727 ms  187.216 ms  63.533 ms
 7  la2ca21crs.ip.att.net (12.122.2.165)  64.552 ms  64.467 ms  63.610 ms
 8  12.122.163.38  63.333 ms  67.396 ms  63.909 ms
 9  12.123.242.29  59.797 ms  59.815 ms  59.722 ms
10  12.124.238.66  66.352 ms  70.978 ms  61.099 ms
11  150.222.97.243  61.411 ms  61.251 ms  62.151 ms
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  ec2-a.b.c.d.ap-southeast-2.compute.amazonaws.com (<EC2_ENDPOINT_IP>) [open]  193.662 ms  193.692 ms  267.633 ms

Traceroute lists hops or routers, and the RTTs for each one. When you review the output, look to see fewer hops, and low consistent RTTs. This output shows seven hops from the client to the accelerator endpoint, and 26 from the client to the EC2 endpoint. Each hop shows the router IP address (or domain name) and 3 RTTs in milliseconds. By default, Traceroute sends three data packets to test each hop.

After the data leaves the local network, the RTT remains constant (around 25ms) for each hop until the accelerator endpoint. For the EC2 endpoint, the RTT is notably slower – between 60ms and 193ms. Note that where hops display asterisks (* * *) in the output, it generally means that routers aren’t configured to return a response or didn’t respond as Traceroute expected them to before timing out. It doesn’t necessarily indicate a problem with the device.

To see packet loss in real time in the route to the destination host, use the MTR (My Traceroute) tool. This tool shows the loss percentage and latency for each host, which reveals the specific provider that might have a network issue.

The following shows an MTR command and the output:

$ mtr -n -T -c <NB_PACKETS> -P <tcp port> <IP_ADDRESS> --report

$ mtr -n -T -c 200 -P 80 <ACCELERATOR_IP> --report
Start: 2020-11-08T00:54:25-0600
HOST: 186590df91e1                Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.254              0.0%   200    6.0  22.0   1.1 291.6  51.6
  2.|-- ???                       100.0   200    0.0   0.0   0.0   0.0   0.0
  3.|-- ???                       100.0   200    0.0   0.0   0.0   0.0   0.0
  4.|-- 12.242.112.22              0.0%   200   28.2  73.5  23.4 1156. 173.9
  5.|-- 12.244.76.10               0.0%   200   25.0  67.6  23.4 1298. 146.1
  6.|-- ???                       100.0   200    0.0   0.0   0.0   0.0   0.0
  7.|-- <ACCELERATOR_IP>           0.0%   200   24.4  96.5  23.4 2233. 265.6

$ mtr -n -T -c 200 -P <EC2_ENDPOINT_IP> --report
Start: 2020-11-08T00:58:17-0600
HOST: 186590df91e1                Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.254              0.0%   200    3.7   9.6   1.2 147.9  20.9
  2.|-- ???                       100.0   200    0.0   0.0   0.0   0.0   0.0
  3.|-- ???                       100.0   200    0.0   0.0   0.0   0.0   0.0
  4.|-- 12.242.112.22              0.0%   200   23.8  57.2  23.2 1152. 153.3
        54.159.44.180
  5.|-- 12.244.76.10               0.0%   199   28.8  69.5  23.7 2143. 212.3
  6.|-- ???                       100.0   198    0.0   0.0   0.0   0.0   0.0
  7.|-- ???                       100.0   198    0.0   0.0   0.0   0.0   0.0
  8.|-- ???                       100.0   198    0.0   0.0   0.0   0.0   0.0
  9.|-- ???                       100.0   198    0.0   0.0   0.0   0.0   0.0
 10.|-- 54.240.229.189            85.9%   198   57.7 111.0  57.0 1111. 200.4
        54.240.229.187
        178.236.3.31
        54.240.229.217
 11.|-- ???                       100.0   198    0.0   0.0   0.0   0.0   0.0
 12.|-- 54.240.229.217            99.5%   198   60.6  60.6  60.6  60.6   0.0
 13.|-- ???                       100.0   198    0.0   0.0   0.0   0.0   0.0

In an MTR report, you can see the percentage of packets that failed to reach their destination. This could indicate a problem with a specific hop, or that there could be control-plane/management-plane policing applied that limits the amount of traffic that is handled by an intermediate router for the generation of ICMP messages. This is a standard behavior for most routers on the Internet in order to ensure they have enough resources for important control-plane tasks. Examining the end-to-end loss, and specifically if loss begins at a specific hop and persists all the way through, can be a better indicator if there is any sort of intermediate network impairment occurring.

Finally, you can also measure network jitter, packet loss, and throughput for UDP connections. To do this, run iPerf with the option -u (or --udp). By default, iPerf limits the bandwidth for UDP clients to 1 Mbit/sec, but you can use option -b (or --bandwidth) to set the desired bandwidth. We recommend that you set it to the maximum bandwidth that can be achieved by your connection. When you run this test with iPerf, it’s important to check the results on the server so the tool calculates jitter and reports the actual data processed. On the client, data is sent at the configured bandwidth regardless of whether and how much of the data arrives at the server. If you use iPerf3, you can add the parameter --get-server-output to have server output reported on the client.

Getting help from AWS Premium Support

If you would like assistance measuring and improving performance with Global Accelerator, you can contact support. To get help from AWS Premium Support, open a technical support case with the following information:

The expected speed to the Region where your endpoint is located. To help determine this, run the AWS Global Accelerator Speed Comparison tool.
TCP Traceroute and MTR reports for both the accelerator DNS/IP and the endpoint (for example, EC2 or ALB).
The time it takes to download an object from Global Accelerator compared to downloading it directly from the endpoint. To get this information, you can use the ab tool, as described in this blog post, or use a cURL command similar to the following: $ curl -L --output /dev/null --silent --show-error --write-out 'lookup: %{time_namelookup}\nconnect: %{time_connect}\nappconnect: %{time_appconnect}\npretransfer: %{time_pretransfer}\nredirect: %{time_redirect}\nstarttransfer: %{time_starttransfer}\ntotal: %{time_total}\n' '<Global-Accelerator-DNS or direct-endpoint>'
If possible, use a tool like tcpdump or Wireshark to capture packets that you can share with Premium Support. This will provide comprehensive information on the packets being transmitted or received over the network. To capture packets with tcpdump, do the following:

- Use the following command: $ tcpdump -n -i any host <Global Accelerator IP address> -w client_latency.pcap
- Let the command run for 5 to 15 minutes, and then use CTRL+C to stop the packet capture.
- Share the client_latency.pcap output file with AWS Support.

You must have a Developer, Business, or Enterprise Support plan to open a technical support case with AWS Premium Support.

Conclusion

We hope that this blog has helped you understand the different options available for measuring and evaluating AWS Global Accelerator performance. For additional service information, visit the product page to learn more about improving the performance and availability of your TCP and UDP applications.

About the authors

	Marco Cagna is a Sr. Manager, Product Management, for AWS Global Accelerator.
	Jibril Touzi is a Sr. Edge Specialist Solutions Architect at AWS. Helping customers innovate using AWS Serverless and Edge services is what keeps him motivated. Jibril is a passionate photographer and enjoys spending time with family in outdoor activities when he is not working.

Networking & Content Delivery